Search Results: "bage"

19 November 2021

Bits from Debian: New Debian Developers and Maintainers (September and October 2021)

The following contributors got their Debian Developer accounts in the last two months: The following contributors were added as Debian Maintainers in the last two months: Congratulations!

17 November 2021

Christoph Berg: PostgreSQL and Undelete

pg_dirtyread Earlier this week, I updated pg_dirtyread to work with PostgreSQL 14. pg_dirtyread is a PostgreSQL extension that allows reading "dead" rows from tables, i.e. rows that have already been deleted, or updated. Of course that works only if the table has not been cleaned-up yet by a VACUUM command or autovacuum, which is PostgreSQL's garbage collection machinery. Here's an example of pg_dirtyread in action:
# create table foo (id int, t text);
CREATE TABLE
# insert into foo values (1, 'Doc1');
INSERT 0 1
# insert into foo values (2, 'Doc2');
INSERT 0 1
# insert into foo values (3, 'Doc3');
INSERT 0 1
# select * from foo;
 id    t
 
  1   Doc1
  2   Doc2
  3   Doc3
(3 rows)
# delete from foo where id < 3;
DELETE 2
# select * from foo;
 id    t
 
  3   Doc3
(1 row)
Oops! The first two documents have disappeared. Now let's use pg_dirtyread to look at the table:
# create extension pg_dirtyread;
CREATE EXTENSION
# select * from pg_dirtyread('foo') t(id int, t text);
 id    t
 
  1   Doc1
  2   Doc2
  3   Doc3
All three documents are still there, but only one of them is visible. pg_dirtyread can also show PostgreSQL's system colums with the row location and visibility information. For the first two documents, xmax is set, which means the row has been deleted:
# select * from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text);
 ctid    xmin   xmax   id    t
 
 (0,1)   1577   1580    1   Doc1
 (0,2)   1578   1580    2   Doc2
 (0,3)   1579      0    3   Doc3
(3 rows)
Undelete Caveat: I'm not promising any of the ideas quoted below will actually work in practice. There are a few caveats and a good portion of intricate knowledge about the PostgreSQL internals might be required to succeed properly. Consider consulting your favorite PostgreSQL support channel for advice if you need to recover data on any production system. Don't try this at work. I always had plans to extend pg_dirtyread to include some "undelete" command to make deleted rows reappear, but never got around to trying that. But rows can already be restored by using the output of pg_dirtyread itself:
# insert into foo select * from pg_dirtyread('foo') t(id int, t text) where id = 1;
This is not a true "undelete", though - it just inserts new rows from the data read from the table. pg_surgery Enter pg_surgery, which is a new PostgreSQL extension supplied with PostgreSQL 14. It contains two functions to "perform surgery on a damaged relation". As a side-effect, they can also make delete tuples reappear. As I discovered now, one of the functions, heap_force_freeze(), works nicely with pg_dirtyread. It takes a list of ctids (row locations) that it marks "frozen", but at the same time as "not deleted". Let's apply it to our test table, using the ctids that pg_dirtyread can read:
# create extension pg_surgery;
CREATE EXTENSION
# select heap_force_freeze('foo', array_agg(ctid))
    from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text) where id = 1;
 heap_force_freeze
 
(1 row)
Et voil , our deleted document is back:
# select * from foo;
 id    t
 
  1   Doc1
  3   Doc3
(2 rows)
# select * from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text);
 ctid    xmin   xmax   id    t
 
 (0,1)      2      0    1   Doc1
 (0,2)   1578   1580    2   Doc2
 (0,3)   1579      0    3   Doc3
(3 rows)
Disclaimer Most importantly, none of the above methods will work if the data you just deleted has already been purged by VACUUM or autovacuum. These actively zero out reclaimed space. Restore from backup to get your data back. Since both pg_dirtyread and pg_surgery operate outside the normal PostgreSQL MVCC machinery, it's easy to create corrupt data using them. This includes duplicated rows, duplicated primary key values, indexes being out of sync with tables, broken foreign key constraints, and others. You have been warned. pg_dirtyread does not work (yet) if the deleted rows contain any toasted values. Possible other approaches include using pageinspect and pg_filedump to retrieve the ctids of deleted rows. Please make sure you have working backups and don't need any of the above.

31 October 2021

Joachim Breitner: A mostly allocation-free optional type

The Motoko programming language has a built-in data type for optional values, named ?t with values null and ?v (for v : t); this is the equivalent of Haskell s Maybe, Ocaml s option or Rust s Option. In this post, I explain how Motoko represents such optional values (almost) without allocation. I neither claim nor expect that any of this is novel; I just hope it s interesting.

Uniform representation The Motoko programming language, designed by Andreas Rossberg and implemented by a pretty cool team at DFINITY is a high-level language with strict semantics and a strong, structural, equi-recursive type system that compiles down to WebAssembly. Because the type system supports polymorphism, it represents all values uniformly. Simplified for the purpose of this blog post, they are always pointers into a heap object where the first word of the heap object, the heap tag, contains information about the value:
 
  tag    
 
The tag is something like array, int64, blob, variant, record, , and it has two purposes:
  • The garbage collector uses it to understand what kind of object it is looking at, so that it can move it around and follow pointers therein. Variable size objects such as arrays include the object size in a subsequent word of the heap object.
  • Some types have values that may have different shapes on the heap. For example, the ropes used in our text representation can either be a plain blob, or a concatenation node of two blobs. For these types, the tag of the heap object is inspected.

The optional type, naively The optional type (?t) is another example of such a type: Its values can either be null, or ?v for some value v of type t, and the primitive operations on this type are the two introduction forms, an analysis function, and a projection for non-null values:
null : () -> ?t
some : t -> ?t
is_some : ?t -> bool
project : ?t -> t     // must only be used if is_some holds
It is natural to use the heap tag to distinguish the two kind of values:
  • The null value is a simple one-word heap object with just a tag that says that this is null:
     
      null  
     
  • The other values are represented by a two-word object: The tag some, indicating that it is a ?v, and then the payload, which is the pointer that represents the value v:
     
      some   payload  
     
With this, the four operations can be implemented as follows:
def null():
  ptr <- alloc(1)
  ptr[0] = NULL_TAG
  return ptr
def some(v):
  ptr <- alloc(2)
  ptr[0] = SOME_TAG
  ptr[1] = v
  return ptr
def is_some(p):
  return p[0] == SOME_TAG
def project(p):
  return p[1]
The problem with this implementation is that null() and some(v) both allocate memory. This is bad if they are used very often, and very liberally, and this is the case in Motoko: For example the iterators used for the for (x in e) construct have type
type Iter<T> =   next : () -> ?T  
and would unavoidably allocate a few words on each iteration. Can we avoid this?

Static values It is quite simple to avoid this for for null: Just statically create a single null value and use it every time:
static_null = [NULL_TAG]
def null():
  return static_null
This way, at least null() doesn t allocate. But we gain more: Now every null value is represented by the same pointer, and since the pointer points to static memory, it does not change even with a moving garbage collector, so we can speed up is_some:
def is_some(p):
  return p != static_null
This is not a very surprising change so far, and most compilers out there can and will do at least the static allocation of such singleton constructors. For example, in Haskell, there is only a single empty list ([]) and a single Nothing value in your program, as you can see in my videos exploring the Haskell heap. But can we get rid of the allocation in some(v) too?

Unwrapped optional values If we don t want to allocate in some(v), what can we do? How about simply
def some(v):
  return v
That does not allocate! But it is also broken. At type ??Int, the values null, ?null and ??null are distinct values, but here this breaks. Or, more formally, the following laws should hold for our four primitive operations:
  1. is_some(null()) = false
  2. v. is_some(some(v)) = true
  3. p. project(some(p)) = p
But with the new definition of some, we d get is_some(some(null())) = false. Not good! But note that we only have a problem if we are wrapping a value that is null or some(v). So maybe take the shortcut only then, and write the following:
def some(v):
  if v == static_null   v[0] == SOME_TAG:
    ptr <- alloc(2)
    ptr[0] = SOME_TAG
    ptr[1] = v
    return ptr
  else:
    return v
The definition of is_some can stay as it is: It is still the case that all null values are represented by static_null. But the some values are now no longer all of the same shape, so we have to change project():
def project(p):
  if p[0] == SOME_TAG:
    return p[1]
  else:
    return p
Now things fall into place: A value ?v can, in many cases, be represented the same way as v, and no allocation is needed. Only when v is null or ?null or ??null or ???null etc. we need to use the some heap object, and thus have to allocate. In fact, it doesn t cost much to avoid allocation even for ?null:
static_some_null = [SOME_TAG, static_null]
def some(v):
  if v == static_null:
    return static_some_null
  else if v[0] == SOME_TAG:
    ptr <- alloc(2)
    ptr[0] = SOME_TAG
    ptr[1] = v
    return ptr
  else:
    return v
So unless one nests the ? type two levels deep, there is no allocation whatsoever, and the only cost is a bit more branching in some and project. That wasn t hard, but quite rewarding, as one can now use idioms like the iterator shown above with greater ease.

Examples The following tables shows the representation of various values before and after. Here [ ] is a pointed-to dynamically allocated heap object, a statically allocated heap object, N = NULL_TAG and S = SOME_TAG.
type value before after
Null null N N
?Int null N N
?Int ?23 [S,23] 23
??Int null N N
??Int ?null [S, N ] S, N
??Int ??23 [S,[S,23]] 23
???Int null N N
???Int ?null [S, N ] S, N
???Int ??null [S,[S, N ]] [S, S, N ]
???Int ???23 [S,[S,[S,23]]] 23

Concluding remarks
  • If you know what parametric polymorphism is, and wonder how this encoding can work in a language that has that, note that this representation is on the low-level of the untyped run-time value representation: We don t need to know the type of v in some(v), merely its heap representation.
  • The uniform representation in Motoko is a bit more involved: The pointers are tagged (by subtracting 1) and some scalar values are represented directly in place (shifted left by 1 bit). But this is luckily orthogonal to what I showed here.
  • Could Haskell do this for its Maybe type? Not so easily:
    • The Maybe type is not built-in, but rather a standard library-defined algebraic data type. But the compiler could feasible detect that this is option-like?
    • Haskell is lazy, so at runtime, the type Maybe could be Nothing, or Just v, or, and this is crucial, a yet to be evaluated expression, also called a thunk. And one definitely needs to distinguish between a thunk t :: Maybe a that may evaluate to Nothing, and a value Just t :: Maybe a that definitely is Just, but contains a value, which may be a thunk.
    Something like this may work for a strict Maybe type or unlifted sums like (# (# #) a #), but may clash with other ticks in GHC, such as pointer tagging.
  • As I said above, I don t expect this to be novel, and I am happy to add references to prior art here.
  • Given that a heap object with tag SOME_TAG now always encodes a tower ? null for n>0, one could try to optimize that even more by just storing the n:
     
      some    n   
     
    But that seems unadvisable: It is only a win if you have deep towers, which is probably rare. Worse, now the project function would have to return such a heap object with n decremented, so now projection might have to allocate, which goes against the cost model expected by the programmer.
  • If you rather want to see code than blog posts, feel free to check out Motoko PR #2115.
  • Does this kind of stuff excite you? Motoko is open source, so your contributions may be welcome!

6 October 2021

Reproducible Builds: Reproducible Builds in September 2021

The goal behind reproducible builds is to ensure that no deliberate flaws have been introduced during compilation processes via promising or mandating that identical results are always generated from a given source. This allowing multiple third-parties to come to an agreement on whether a build was compromised or not by a system of distributed consensus. In these reports we outline the most important things that have been happening in the world of reproducible builds in the past month:
First mentioned in our March 2021 report, Martin Heinz published two blog posts on sigstore, a project that endeavours to offer software signing as a public good, [the] software-signing equivalent to Let s Encrypt . The two posts, the first entitled Sigstore: A Solution to Software Supply Chain Security outlines more about the project and justifies its existence:
Software signing is not a new problem, so there must be some solution already, right? Yes, but signing software and maintaining keys is very difficult especially for non-security folks and UX of existing tools such as PGP leave much to be desired. That s why we need something like sigstore - an easy to use software/toolset for signing software artifacts.
The second post (titled Signing Software The Easy Way with Sigstore and Cosign) goes into some technical details of getting started.
There was an interesting thread in the /r/Signal subreddit that started from the observation that Signal s apk doesn t match with the source code:
Some time ago I checked Signal s reproducibility and it failed. I asked others to test in case I did something wrong, but nobody made any reports. Since then I tried to test the Google Play Store version of the apk against one I compiled myself, and that doesn t match either.

BitcoinBinary.org was announced this month, which aims to be a repository of Reproducible Build Proofs for Bitcoin Projects :
Most users are not capable of building from source code themselves, but we can at least get them able enough to check signatures and shasums. When reputable people who can tell everyone they were able to reproduce the project s build, others at least have a secondary source of validation.

Distribution work Fr d ric Pierret announced a new testing service at beta.tests.reproducible-builds.org, showing actual rebuilds of binaries distributed by both the Debian and Qubes distributions. In Debian specifically, however, 51 reviews of Debian packages were added, 31 were updated and 31 were removed this month to our database of classified issues. As part of this, Chris Lamb refreshed a number of notes, including the build_path_in_record_file_generated_by_pybuild_flit_plugin issue. Elsewhere in Debian, Roland Clobus posted his Fourth status update about reproducible live-build ISO images in Jenkins to our mailing list, which mentions (amongst other things) that:
  • All major configurations are still built regularly using live-build and bullseye.
  • All major configurations are reproducible now; Jenkins is green.
    • I ve worked around the issue for the Cinnamon image.
    • The patch was accepted and released within a few hours.
  • My main focus for the last month was on the live-build tool itself.
Related to this, there was continuing discussion on how to embed/encode the build metadata for the Debian live images which were being worked on by Roland Clobus.
Ariadne Conill published another detailed blog post related to various security initiatives within the Alpine Linux distribution. After summarising some conventional security work being done (eg. with sudo and the release of OpenSSH version 3.0), Ariadne included another section on reproducible builds: The main blocker [was] determining what to do about storing the build metadata so that a build environment can be recreated precisely . Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report.

Community news On our website this month, Bernhard M. Wiedemann fixed some broken links [ ] and Holger Levsen made a number of changes to the Who is Involved? page [ ][ ][ ]. On our mailing list, Magnus Ihse Bursie started a thread with the subject Reproducible builds on Java, which begins as follows:
I m working for Oracle in the Build Group for OpenJDK which is primary responsible for creating a built artifact of the OpenJDK source code. [ ] For the last few years, we have worked on a low-effort, background-style project to make the build of OpenJDK itself building reproducible. We ve come far, but there are still issues I d like to address. [ ]

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 183, 184 and 185 as well as performed significant triaging of merge requests and other issues in addition to making the following changes:
  • New features:
    • Support a newer format version of the R language s .rds files. [ ]
    • Update tests for OCaml 4.12. [ ]
    • Add a missing format_class import. [ ]
  • Bug fixes:
    • Don t call close_archive when garbage collecting Archive instances, unless open_archive definitely returned successfully. This prevents, for example, an AttributeError where PGPContainer s cleanup routines were rightfully assuming that its temporary directory had actually been created. [ ]
    • Fix (and test) the comparison of R language s .rdb files after refactoring temporary directory handling. [ ]
    • Ensure that RPM archives exists in the Debian package description, regardless of whether python3-rpm is installed or not at build time. [ ]
  • Codebase improvements:
    • Use our assert_diff routine in tests/comparators/test_rdata.py. [ ]
    • Move diffoscope.versions to diffoscope.tests.utils.versions. [ ]
    • Reformat a number of modules with Black. [ ][ ]
However, the following changes were also made:
  • Mattia Rizzolo:
    • Fix an autopkgtest caused by the androguard module not being in the (expected) python3-androguard Debian package. [ ]
    • Appease a shellcheck warning in debian/tests/control.sh. [ ]
    • Ignore a warning from h5py in our tests that doesn t concern us. [ ]
    • Drop a trailing .1 from the Standards-Version field as it s required. [ ]
  • Zbigniew J drzejewski-Szmek:
    • Stop using the deprecated distutils.spawn.find_executable utility. [ ][ ][ ][ ][ ]
    • Adjust an LLVM-related test for LLVM version 13. [ ]
    • Update invocations of llvm-objdump. [ ]
    • Adjust a test with a one-byte text file for file version 5.40. [ ]
And, finally, Benjamin Peterson added a --diff-context option to control unified diff context size [ ] and Jean-Romain Garnier fixed the Macho comparator for architectures other than x86-64 [ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Testing framework The Reproducible Builds project runs a testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Drop my package rebuilder prototype as it s not useful anymore. [ ]
    • Schedule old packages in Debian bookworm. [ ]
    • Stop scheduling packages for Debian buster. [ ][ ]
    • Don t include PostgreSQL debug output in package lists. [ ]
    • Detect Python library mismatches during build in the node health check. [ ]
    • Update a note on updating the FreeBSD system. [ ]
  • Mattia Rizzolo:
    • Silence a warning from Git. [ ]
    • Update a setting to reflect that Debian bookworm is the new testing. [ ]
    • Upgrade the PostgreSQL database to version 13. [ ]
  • Roland Clobus (Debian live image generation):
    • Workaround non-reproducible config files in the libxml-sax-perl package. [ ]
    • Use the new DNS for the snapshot service. [ ]
  • Vagrant Cascadian:
    • Also note that the armhf architecture also systematically varies by the kernel. [ ]

Contributing If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

1 October 2021

Junichi Uekawa: Garbage collecting with podman system prune.

Garbage collecting with podman system prune. Tells me it freed 20GB when it seems to have freed 4GB. Wondering where that discrepancy comes from.

Reproducible Builds (diffoscope): diffoscope 186 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 186. This version includes the following changes:
[ Chris Lamb ]
* Don't call close_archive when garbage-collecting Archive instances unless
  open_archive returned successfully. This prevents, amongst others, an
  AttributeError traceback due to PGPContainer's cleanup routines assuming
  that its temporary directory had been created.
  (Closes: reproducible-builds/diffoscope#276)
* Ensure that the string "RPM archives" exists in the package description,
  regardless of whether python3-rpm is installed or not at build time.
[ Jean-Romain Garnier ]
* Fix the LVM Macho comparator for non-x86-64 architectures.
You find out more by visiting the project homepage.

19 August 2021

Russ Allbery: Review: The Past is Red

Review: The Past is Red, by Catherynne M. Valente
Publisher: Tordotcom
Copyright: 2021
ISBN: 1-250-30112-2
Format: Kindle
Pages: 151
Tetley is nineteen and is the most hated person in Garbagetown. That's kind of horrible, but life is otherwise great. As she puts it:
I'm awfully lucky when you think about it. Garbagetown is the most wonderful place anybody has ever lived in the history of the world, even if you count the Pyramids and New York City and Camelot. I have Grape Crush and Big Bargains and my hibiscus flower, and I can fish like I've got bait for a heart so I hardly ever go hungry, and once I found a ruby ring and a New Mexico license plate inside a bluefin tuna. Everyone says they only hate me because I annihilated hope and butchered our future, but I know better, and anyway, it's a lie. Some people are just born to be despised. The Loathing of Tetley began small and grew bigger and bigger, like the Thames, until it swallowed me whole.
Garbagetown is a giant floating pile of garbage in the middle of the ocean, and it is, so far as anyone knows, the only "land" left in the world. Global warming has flooded everything until the remaining Fuckwits (as their future descendants call everyone who was alive at the time) took to the Misery Boats and searched hopelessly for land. Eventually they realized they could live on top of the now-massive Pacific Garbage Patch and began the Great Sorting, which is fifty years into Tetley's past. All of the types of garbage were moved into their own areas, allowing small micronations of scavengers to form and giving each area its own character.
Candle Hole is the most beautiful place in Garbagetown, which is the most beautiful place in the world. All of the stubs of candles the Fuckwits threw out piled up into hills and mountains and caverns and dells, votive candles and taper candles and tea lights and birthday candles and big fat colorful pillar candles, stacked and somewhat melted into a great crumbling gorgeous warren of wicks and wax. All the houses are cozy little honeycombs melted into hillside, with smooth round windows and low golden ceilings. At night, from far away, Candle Hole looks like a firefly palace. When the wind blows, it smells like cinnamon, and freesia, and cranberries, and lavender, and Fresh Linen Scent, and New Car Smell.
Two things should be obvious from this introduction. First, do not read this book looking for an accurate, technical projection of our environmental future. Or, for that matter, physical realism of any kind. That's not the book that Valente is writing and you'll just frustrate yourself. This is science fiction as concretized metaphor rather than prediction or scientific exploration. We Fuckwits have drowned the world with our greed and left our descendants living in piles of our garbage; you have to suspend disbelief and go with the premise. The second thing is that either you will like Tetley's storytelling style or you will not like this book. I find Valente very hit-and-miss, but this time it worked for me. The language is a bit less over-the-top than Space Opera, and it fits Tetley's insistent, aggressive optimism so well that it carries much of the weight of characterization. Mileage will definitely vary; this is probably a love-it-or-hate-it book. The Past is Red is divided into two parts. The first part is the short story "The Future is Blue," previously published in Clarkesworld and in Valente's short story collection of the same name. It tells the story of Tetley's early life, how she got her name, and how she became the most hated person in Garbagetown. The second part is much longer and features an older, quieter, more thoughtful, and somewhat more cynical Tetley, more life philosophy, and a bit of more-traditional puzzle science fiction. It lacks some of the bubbly energy of "The Future is Blue" but also features less violence and abuse. The overall work is a long novella or very short novel. This book has a lot of feelings about the environment, capitalism, greed, and the desire to let other people solve your problems for you, and it is not subtle about any of them. It's satisfying in the way that a good rant is satisfying, not in the way that a coherent political strategy is satisfying. What saves it from being too didactic or self-righteous is Tetley, who is happy to record her own emotions and her moments of wonder and is mostly uninterested in telling other people what to do. The setting sounds darkly depressing, and there are moments where it feels that way in the book, but the core of the story and of Tetley's life philosophy is a type of personal resilience: find the things that make you happy, put one foot in front of the other, and enjoy the world for what it is rather than what it could be or what other people want to convince you it might be. It's also surprisingly funny, particularly if you see the humor in bizarrely-specific piles of the detritus of civilization. The one place where I will argue with Valente a bit is that The Past is Red thoroughly embraces an environmental philosophy of personal responsibility. The devastating critique aimed at the Fuckwits is universal and undistinguished except slightly by class. Tetley and the other inhabitants of Garbagetown make no distinction between types of Fuckits or attempt to apportion blame in any way more granular than entire generations and eras. This is probably realistic. I understand why, by Tetley's time, no one is interested in the fine points of history. But the story was written today, for readers in our time, and this type of responsibility collapse is intentionally and carefully constructed by the largest polluters and the people with the most power. Collective and undifferentiated responsibility means that we're using up our energy fretting about whether we took two showers, which partly deflects attention from the companies, industries, and individuals that are directly responsible for the vast majority of environmental damage. We don't live in a world full of fuckwits; we live in a world run by fuckwits and full of the demoralized, harried, conned, manipulated, overwhelmed, and apathetic, which is a small but important difference. This book is not the right venue to explore that difference, but I wish the vitriol had not been applied quite so indiscriminately. The rest, though, worked for me. Valente tends to describe things by piling clauses on top of adjectives, which objectively isn't the best writing but it fits everything about Tetley's personality so well that I think this is the book where it works. I found her strange mix of optimism, practicality, and unbreakable inner ethics oddly endearing. "The Future is Blue" is available for free on-line, so if in doubt, read some or all of it, and that should tell you whether you're interested in the expansion. I'm glad I picked it up. Content warning for physical and sexual abuse of the first-person protagonist, mostly in the first section. Rating: 7 out of 10

2 August 2021

Colin Watson: Launchpad now runs on Python 3!

After a very long porting journey, Launchpad is finally running on Python 3 across all of our systems. I wanted to take a bit of time to reflect on why my emotional responses to this port differ so much from those of some others who ve done large ports, such as the Mercurial maintainers. It s hard to deny that we ve had to burn a lot of time on this, which I m sure has had an opportunity cost, and from one point of view it s essentially running to stand still: there is no single compelling feature that we get solely by porting to Python 3, although it s clearly a prerequisite for tidying up old compatibility code and being able to use modern language facilities in the future. And yet, on the whole, I found this a rewarding project and enjoyed doing it. Some of this may be because by inclination I m a maintenance programmer and actually enjoy this sort of thing. My default view tends to be that software version upgrades may be a pain but it s much better to get that pain over with as soon as you can rather than trying to hold back the tide; you can certainly get involved and try to shape where things end up, but rightly or wrongly I can t think of many cases when a righteously indignant user base managed to arrange for the old version to be maintained in perpetuity so that they never had to deal with the new thing (OK, maybe Perl 5 counts here). I think a more compelling difference between Launchpad and Mercurial, though, may be that very few other people really had a vested interest in what Python version Launchpad happened to be running, because it s all server-side code (aside from some client libraries such as launchpadlib, which were ported years ago). As such, we weren t trying to do this with the internet having Strong Opinions at us. We were doing this because it was obviously the only long-term-maintainable path forward, and in more recent times because some of our library dependencies were starting to drop support for Python 2 and so it was obviously going to become a practical problem for us sooner or later; but if we d just stayed on Python 2 forever then fundamentally hardly anyone else would really have cared directly, only maybe about some indirect consequences of that. I don t follow Mercurial development so I may be entirely off-base, but if other people were yelling at me about how late my project was to finish its port, that in itself would make me feel more negatively about the project even if I thought it was a good idea. Having most of the pressure come from ourselves rather than from outside meant that wasn t an issue for us. I m somewhat inclined to think of the process as an extreme version of paying down technical debt. Moving from Python 2.7 to 3.5, as we just did, means skipping over multiple language versions in one go, and if similar changes had been made more gradually it would probably have felt a lot more like the typical dependency update treadmill. I appreciate why not everyone might want to think of it this way: maybe this is just my own rationalization. Reflections on porting to Python 3 I m not going to defend the Python 3 migration process; it was pretty rough in a lot of ways. Nor am I going to spend much effort relitigating it here, as it s already been done to death elsewhere, and as I understand it the core Python developers have got the message loud and clear by now. At a bare minimum, a lot of valuable time was lost early in Python 3 s lifetime hanging on to flag-day-type porting strategies that were impractical for large projects, when it should have been providing for bilingual strategies (code that runs in both Python 2 and 3 for a transitional period) which is where most libraries and most large migrations ended up in practice. For instance, the early advice to library maintainers to maintain two parallel versions or perhaps translate dynamically with 2to3 was entirely impractical in most non-trivial cases and wasn t what most people ended up doing, and yet the idea that 2to3 is all you need still floats around Stack Overflow and the like as a result. (These days, I would probably point people towards something more like Eevee s porting FAQ as somewhere to start.) There are various fairly straightforward things that people often suggest could have been done to smooth the path, and I largely agree: not removing the u'' string prefix only to put it back in 3.3, fewer gratuitous compatibility breaks in the name of tidiness, and so on. But if I had a time machine, the number one thing I would ask to have been done differently would be introducing type annotations in Python 2 before Python 3 branched off. It s true that it s technically possible to do type annotations in Python 2, but the fact that it s a different syntax that would have to be fixed later is offputting, and in practice it wasn t widely used in Python 2 code. To make a significant difference to the ease of porting, annotations would need to have been introduced early enough that lots of Python 2 library code used them so that porting code didn t have to be quite so much of an exercise of manually figuring out the exact nature of string types from context. Launchpad is a complex piece of software that interacts with multiple domains: for example, it deals with a database, HTTP, web page rendering, Debian-format archive publishing, and multiple revision control systems, and there s often overlap between domains. Each of these tends to imply different kinds of string handling. Web page rendering is normally done mainly in Unicode, converting to bytes as late as possible; revision control systems normally want to spend most of their time working with bytes, although the exact details vary; HTTP is of course bytes on the wire, but Python s WSGI interface has some string type subtleties. In practice I found myself thinking about at least four string-like types (that is, things that in a language with a stricter type system I might well want to define as distinct types and restrict conversion between them): bytes, text, ordinary native strings (str in either language, encoded to UTF-8 in Python 2), and native strings with WSGI s encoding rules. Some of these are emergent properties of writing in the intersection of Python 2 and 3, which is effectively a specialized language of its own without coherent official documentation whose users must intuit its behaviour by comparing multiple sources of information, or by referring to unofficial porting guides: not a very satisfactory situation. Fortunately much of the complexity collapses once it becomes possible to write solely in Python 3. Some of the difficulties we ran into are not ones that are typically thought of as Python 2-to-3 porting issues, because they were changed later in Python 3 s development process. For instance, the email module was substantially improved in around the 3.2/3.3 timeframe to handle Python 3 s bytes/text model more correctly, and since Launchpad sends quite a few different kinds of email messages and has some quite picky tests for exactly what it emits, this entailed a lot of work in our email sending code and in our test suite to account for that. (It took me a while to work out whether we should be treating raw email messages as bytes or as text; bytes turned out to work best.) 3.4 made some tweaks to the implementation of quoted-printable encoding that broke a number of our tests in ways that took some effort to fix, because the tests needed to work on both 2.7 and 3.5. The list goes on. I got quite proficient at digging through Python s git history to figure out when and why some particular bit of behaviour had changed. One of the thorniest problems was parsing HTTP form data. We mainly rely on zope.publisher for this, which in turn relied on cgi.FieldStorage; but cgi.FieldStorage is badly broken in some situations on Python 3. Even if that bug were fixed in a more recent version of Python, we can t easily use anything newer than 3.5 for the first stage of our port due to the version of the base OS we re currently running, so it wouldn t help much. In the end I fixed some minor issues in the multipart module (and was kindly given co-maintenance of it) and converted zope.publisher to use it. Although this took a while to sort out, it seems to have gone very well. A couple of other interesting late-arriving issues were around pickle. For most things we normally prefer safer formats such as JSON, but there are a few cases where we use pickle, particularly for our session databases. One of my colleagues pointed out that I needed to remember to tell pickle to stick to protocol 2, so that we d be able to switch back and forward between Python 2 and 3 for a while; quite right, and we later ran into a similar problem with marshal too. A more surprising problem was that datetime.datetime objects pickled on Python 2 require special care when unpickling on Python 3; rather than the approach that ended up being implemented and documented for Python 3.6, though, I preferred a custom unpickler, both so that things would work on Python 3.5 and so that I wouldn t have to risk affecting the decoding of other pickled strings in the session database. General lessons Writing this over a year after Python 2 s end-of-life date, and certainly nowhere near the leading edge of Python 3 porting work, it s perhaps more useful to look at this in terms of the lessons it has for other large technical debt projects. I mentioned in my previous article that I used the approach of an enormous and frequently-rebased git branch as a working area for the port, committing often and sometimes combining and extracting commits for review once they seemed to be ready. A port of this scale would have been entirely intractable without a tool of similar power to git rebase, so I m very glad that we finished migrating to git in 2019. I relied on this right up to the end of the port, and it also allowed for quick assessments of how much more there was to land. git worktree was also helpful, in that I could easily maintain working trees built for each of Python 2 and 3 for comparison. As is usual for most multi-developer projects, all changes to Launchpad need to go through code review, although we sometimes make exceptions for very simple and obvious changes that can be self-reviewed. Since I knew from the outset that this was going to generate a lot of changes for review, I therefore structured my work from the outset to try to make it as easy as possible for my colleagues to review it. This generally involved keeping most changes to a somewhat manageable size of 800 lines or less (although this wasn t always possible), and arranging commits mainly according to the kind of change they made rather than their location. For example, when I needed to fix issues with / in Python 3 being true division rather than floor division, I did so in one commit across the various places where it mattered and took care not to mix it with other unrelated changes. This is good practice for nearly any kind of development, but it was especially important here since it allowed reviewers to consider a clear explanation of what I was doing in the commit message and then skim-read the rest of it much more quickly. It was vital to keep the codebase in a working state at all times, and deploy to production reasonably often: this way if something went wrong the amount of code we had to debug to figure out what had happened was always tractable. (Although I can t seem to find it now to link to it, I saw an account a while back of a company that had taken a flag-day approach instead with a large codebase. It seemed to work for them, but I m certain we couldn t have made it work for Launchpad.) I can t speak too highly of Launchpad s test suite, much of which originated before my time. Without a great deal of extensive coverage of all sorts of interesting edge cases at both the unit and functional level, and a corresponding culture of maintaining that test suite well when making new changes, it would have been impossible to be anything like as confident of the port as we were. As part of the porting work, we split out a couple of substantial chunks of the Launchpad codebase that could easily be decoupled from the core: its Mailman integration and its code import worker. Both of these had substantial dependencies with complex requirements for porting to Python 3, and arranging to be able to do these separately on their own schedule was absolutely worth it. Like disentangling balls of wool, any opportunity you can take to make things less tightly-coupled is probably going to make it easier to disentangle the rest. (I can see a tractable way forward to porting the code import worker, so we may well get that done soon. Our Mailman integration will need to be rewritten, though, since it currently depends on the Python-2-only Mailman 2, and Mailman 3 has a different architecture.) Python lessons Our database layer was already in pretty good shape for a port, since at least the modern bits of its table modelling interface were already strict about using Unicode for text columns. If you have any kind of pervasive low-level framework like this, then making it be pedantic at you in advance of a Python 3 port will probably incur much less swearing in the long run, as you won t be trying to deal with quite so many bytes/text issues at the same time as everything else. Early in our port, we established a standard set of __future__ imports and started incrementally converting files over to them, mainly because we weren t yet sure what else to do and it seemed likely to be helpful. absolute_import was definitely reasonable (and not often a problem in our code), and print_function was annoying but necessary. In hindsight I m not sure about unicode_literals, though. For files that only deal with bytes and text it was reasonable enough, but as I mentioned above there were also a number of cases where we needed literals of the language s native str type, i.e. bytes in Python 2 and text in Python 3: this was particularly noticeable in WSGI contexts, but also cropped up in some other surprising places. We generally either omitted unicode_literals or used six.ensure_str in such cases, but it was definitely a bit awkward and maybe I should have listened more to people telling me it might be a bad idea. A lot of Launchpad s early tests used doctest, mainly in the style where you have text files that interleave narrative commentary with examples. The development team later reached consensus that this was best avoided in most cases, but by then there were far too many doctests to conveniently rewrite in some other form. Porting doctests to Python 3 is really annoying. You run into all the little changes in how objects are represented as text (particularly u'...' versus '...', but plenty of other cases as well); you have next to no tools to do anything useful like skipping individual bits of a doctest that don t apply; using __future__ imports requires the rather obscure approach of adding the relevant names to the doctest s globals in the relevant DocFileSuite or DocTestSuite; dealing with many exception tracebacks requires something like zope.testing.renormalizing; and whatever code refactoring tools you re using probably don t work properly. Basically, don t have done that. It did all turn out to be tractable for us in the end, and I managed to avoid using much in the way of fragile doctest extensions aside from the aforementioned zope.testing.renormalizing, but it was not an enjoyable experience. Regressions I know of nine regressions that reached Launchpad s production systems as a result of this porting work; of course there were various other regressions caught by CI or in manual testing. (Considering the size of this project, I count it as a resounding success that there were only nine production issues, and that for the most part we were able to fix them quickly.) Equality testing of removed database objects One of the things we had to do while porting to Python 3 was to implement the __eq__, __ne__, and __hash__ special methods for all our database objects. This was quite conceptually fiddly, because doing this requires knowing each object s primary key, and that may not yet be available if we ve created an object in Python but not yet flushed the actual INSERT statement to the database (most of our primary keys are auto-incrementing sequences). We thus had to take care to flush pending SQL statements in such cases in order to ensure that we know the primary keys. However, it s possible to have a problem at the other end of the object lifecycle: that is, a Python object might still be reachable in memory even though the underlying row has been DELETEd from the database. In most cases we don t keep removed objects around for obvious reasons, but it can happen in caching code, and buildd-manager crashed as a result (in fact while it was still running on Python 2). We had to take extra care to avoid this problem. Debian imports crashed on non-UTF-8 filenames Python 2 has some unfortunate behaviour around passing bytes or Unicode strings (depending on the platform) to shutil.rmtree, and the combination of some porting work and a particular source package in Debian that contained a non-UTF-8 file name caused us to run into this. The fix was to ensure that the argument passed to shutil.rmtree is a str regardless of Python version. We d actually run into something similar before: it s a subtle porting gotcha, since it s quite easy to end up passing Unicode strings to shutil.rmtree if you re in the process of porting your code to Python 3, and you might easily not notice if the file names in your tests are all encoded using UTF-8. lazr.restful ETags We eventually got far enough along that we could switch one of our four appserver machines (we have quite a number of other machines too, but the appservers handle web and API requests) to Python 3 and see what happened. By this point our extensive test suite had shaken out the vast majority of the things that could go wrong, but there was always going to be room for some interesting edge cases. One of the Ubuntu kernel team reported that they were seeing an increase in 412 Precondition Failed errors in some of their scripts that use our webservice API. These can happen when you re trying to modify an existing resource: the underlying protocol involves sending an If-Match header with the ETag that the client thinks the resource has, and if this doesn t match the ETag that the server calculates for the resource then the client has to refresh its copy of the resource and try again. We initially thought that this might be legitimate since it can happen in normal operation if you collide with another client making changes to the same resource, but it soon became clear that something stranger was going on: we were getting inconsistent ETags for the same object even when it was unchanged. Since we d recently switched a quarter of our appservers to Python 3, that was a natural suspect. Our lazr.restful package provides the framework for our webservice API, and roughly speaking it generates ETags by serializing objects into some kind of canonical form and hashing the result. Unfortunately the serialization was dependent on the Python version in a few ways, and in particular it serialized lists of strings such as lists of bug tags differently: Python 2 used [u'foo', u'bar', u'baz'] where Python 3 used ['foo', 'bar', 'baz']. In lazr.restful 1.0.3 we switched to using JSON for this, removing the Python version dependency and ensuring consistent behaviour between appservers. Memory leaks This problem took the longest to solve. We noticed fairly quickly from our graphs that the appserver machine we d switched to Python 3 had a serious memory leak. Our appservers had always been a bit leaky, but now it wasn t so much a small hole that we can bail occasionally as the boat is sinking rapidly : A serious memory leak (Yes, this got in the way of working out what was going on with ETags for a while.) I spent ages messing around with various attempts to fix this. Since only a quarter of our appservers were affected, and we could get by on 75% capacity for a while, it wasn t urgent but it was definitely annoying. After spending some quality time with objgraph, for some time I thought traceback reference cycles might be at fault, and I sent a number of fixes to various upstream projects for those (e.g. zope.pagetemplate). Those didn t help the leaks much though, and after a while it became clear to me that this couldn t be the sole problem: Python has a cyclic garbage collector that will eventually collect reference cycles as long as there are no strong references to any objects in them, although it might not happen very quickly. Something else must be going on. Debugging reference leaks in any non-trivial and long-running Python program is extremely arduous, especially with ORMs that naturally tend to end up with lots of cycles and caches. After a while I formed a hypothesis that zope.server might be keeping a strong reference to something, although I never managed to nail it down more firmly than that. This was an attractive theory as we were already in the process of migrating to Gunicorn for other reasons anyway, and Gunicorn also has a convenient max_requests setting that s good at mitigating memory leaks. Getting this all in place took some time, but once we did we found that everything was much more stable: A rather flat memory graph This isn t completely satisfying as we never quite got to the bottom of the leak itself, and it s entirely possible that we ve only papered over it using max_requests: I expect we ll gradually back off on how frequently we restart workers over time to try to track this down. However, pragmatically, it s no longer an operational concern. Mirror prober HTTPS proxy handling After we switched our script servers to Python 3, we had several reports of mirror probing failures. (Launchpad keeps lists of Ubuntu archive and image mirrors, and probes them every so often to check that they re reasonably complete and up to date.) This only affected HTTPS mirrors when probed via a proxy server, support for which is a relatively recent feature in Launchpad and involved some code that we never managed to unit-test properly: of course this is exactly the code that went wrong. Sadly I wasn t able to sort out that gap, but at least the fix was simple. Non-MIME-encoded email headers As I mentioned above, there were substantial changes in the email package between Python 2 and 3, and indeed between minor versions of Python 3. Our test coverage here is pretty good, but it s an area where it s very easy to have gaps. We noticed that a script that processes incoming email was crashing on messages with headers that were non-ASCII but not MIME-encoded (and indeed then crashing again when it tried to send a notification of the crash!). The only examples of these I looked at were spam, but we still didn t want to crash on them. The fix involved being somewhat more careful about both the handling of headers returned by Python s email parser and the building of outgoing email notifications. This seems to be working well so far, although I wouldn t be surprised to find the odd other incorrect detail in this sort of area. Failure to handle non-ISO-8859-1 URL-encoded form input Remember how I said that parsing HTTP form data was thorny? After we finished upgrading all our appservers to Python 3, people started reporting that they couldn t post Unicode comments to bugs, which turned out to be only if the attempt was made using JavaScript, and was because I hadn t quite managed to get URL-encoded form data working properly with zope.publisher and multipart. The current standard describes the URL-encoded format for form data as in many ways an aberrant monstrosity , so this was no great surprise. Part of the problem was some very strange choices in zope.publisher dating back to 2004 or earlier, which I attempted to clean up and simplify. The rest was that Python 2 s urlparse.parse_qs unconditionally decodes percent-encoded sequences as ISO-8859-1 if they re passed in as part of a Unicode string, so multipart needs to work around this on Python 2. I m still not completely confident that this is correct in all situations, but at least now that we re on Python 3 everywhere the matrix of cases we need to care about is smaller. Inconsistent marshalling of Loggerhead s disk cache We use Loggerhead for providing web browsing of Bazaar branches. When we upgraded one of its two servers to Python 3, we immediately noticed that the one still on Python 2 was failing to read back its revision information cache, which it stores in a database on disk. (We noticed this because it caused a deployment to fail: when we tried to roll out new code to the instance still on Python 2, Nagios checks had already caused an incompatible cache to be written for one branch from the Python 3 instance.) This turned out to be a similar problem to the pickle issue mentioned above, except this one was with marshal, which I didn t think to look for because it s a relatively obscure module mostly used for internal purposes by Python itself; I m not sure that Loggerhead should really be using it in the first place. The fix was relatively straightforward, complicated mainly by now needing to cope with throwing away unreadable cache data. Ironically, if we d just gone ahead and taken the nominally riskier path of upgrading both servers at the same time, we might never have had a problem here. Intermittent bzr failures Finally, after we upgraded one of our two Bazaar codehosting servers to Python 3, we had a report of intermittent bzr branch hangs. After some digging I found this in our logs:
Traceback (most recent call last):
  ...
  File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/twisted/conch/ssh/channel.py", line 136, in addWindowBytes
    self.startWriting()
  File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/lazr/sshserver/session.py", line 88, in startWriting
    resumeProducing()
  File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/twisted/internet/process.py", line 894, in resumeProducing
    for p in self.pipes.itervalues():
builtins.AttributeError: 'dict' object has no attribute 'itervalues'
I d seen this before in our git hosting service: it was a bug in Twisted s Python 3 port, fixed after 20.3.0 but unfortunately after the last release that supported Python 2, so we had to backport that patch. Using the same backport dealt with this. Onwards!

24 May 2021

Antoine Beaupr : Leaving Freenode

The freenode IRC network has been hijacked. TL;DR: move to libera.chat or OFTC.net, as did countless free software projects including Gentoo, CentOS, KDE, Wikipedia, FOSDEM, and more. Debian and the Tor project were already on OFTC and are not affected by this.

What is freenode and why should I care? freenode is the largest remaining IRC network. Before this incident, it had close to 80,000 users, which is small in terms of modern internet history -- even small social networks are larger by multiple orders of magnitude -- but is large in IRC history. The IRC network is also extensively used by the free software community, being the default IRC network on many programs, and used by hundreds if not thousands of free software projects. I have been using freenode since at least 2006. This matters if you care about IRC, the internet, open protocols, decentralisation, and, to a certain extent, federation as well. It also touches on who has the right on network resources: the people who "own" it (through money) or the people who make it work (through their labor). I am biased towards open protocols, the internet, federation, and worker power, and this might taint this analysis.

What happened? It's a long story, but basically:
  1. back in 2017, the former head of staff sold the freenode.net domain (and its related company) to Andrew Lee, "American entrepreneur, software developer and writer", and, rather weirdly, supposedly "crown prince of Korea" although that part is kind of complex (see House of Yi, Yi Won, and Yi Seok). It should be noted the Korean Empire hasn't existed for over a century at this point (even though its flag, also weirdly, remains)
  2. back then, this was only known to the public as this strange PIA and freenode joining forces gimmick. it was suspicious at first, but since the network kept running, no one paid much attention to it. opers of the network were similarly reassured that Lee would have no say in the management of the network
  3. this all changed recently when Lee asserted ownership of the freenode.net domain and started meddling in the operations of the network, according to this summary. this part is disputed, but it is corroborated by almost a dozen former staff which collectively resigned from the network in protest, after legal threats, when it was obvious freenode was lost.
  4. the departing freenode staff founded a new network, irc.libera.chat, based on the new ircd they were working on with OFTC, solanum
  5. meanwhile, bot armies started attacking all IRC networks: both libera and freenode, but also OFTC and unrelated networks like a small one I help operate. those attacks have mostly stopped as of this writing (2021-05-24 17:30UTC)
  6. on freenode, however, things are going for the worse: Lee has been accused of taking over a channel, in a grotesque abuse of power; then changing freenode policy to not only justify the abuse, but also remove rules against hateful speech, effectively allowing nazis on the network (update: the change was reverted, but not by Lee)
Update: even though the policy change was reverted, the actual conversations allowed on freenode have already degenerated into toxic garbage. There are also massive channel takeovers (presumably over 700), mostly on channels that were redirecting to libera, but also channels that were still live. Channels that were taken over include #fosdem, #wikipedia, #haskell... Instead of working on the network, the new "so-called freenode" staff is spending effort writing bots and patches to basically automate taking over channels. I run an IRC network and this bot is obviously not standard "services" stuff... This is just grotesque. At this point I agree with this HN comment:
We should stop implicitly legitimizing Andrew Lee's power grab by referring to his dominion as "Freenode". Freenode is a quarter-century-old community that has changed its name to libera.chat; the thing being referred to here as "Freenode" is something else that has illegitimately acquired control of Freenode's old servers and user database, causing enormous inconvenience to the real Freenode.
I don't agree with the suggested name there, let's instead call it "so called freenode" as suggested later in the thread.

What now? I recommend people and organisations move away from freenode as soon as possible. This is a major change: documentation needs to be fixed, and the migration needs to be coordinated. But I do not believe we can trust the new freenode "owners" to operate the network reliably and in good faith. It's also important to use the current momentum to build a critical mass elsewhere so that people don't end up on freenode again by default and find an even more toxic community than your typical run-of-the-mill free software project (which is already not a high bar to meet). Update: people are moving to libera in droves. It's now reaching 18,000 users, which is bigger than OFTC and getting close to the largest, traditionnal, IRC networks (EFnet, Undernet, IRCnet are in the 10-20k users range). so-called freenode is still larger, currently clocking 68,000 users, but that's a huge drop from the previous count which was 78,000 before the exodus began. We're even starting to see the effects of the migration on netsplit.de. Update 2: the isfreenodedeadyet.com site is updated more frequently than netsplit and shows tons more information. It shows 25k online users for libera and 61k for so-called freenode (down from ~78k), and the trend doesn't seem to be stopping for so-called freenode. There's also a list of 400+ channels that have moved out. Keep in mind that such migrations take effect over long periods of time.

Where do I move to? The first thing you should do is to figure out which tool to use for interactive user support. There are multiple alternatives, of course -- this is the internet after all -- but here is a short list of suggestions, in preferred priority order:
  1. irc.libera.chat
  2. irc.OFTC.net
  3. Matrix.org, which bridges with OFTC and (hopefully soon) with libera as well, modern IRC alternative
  4. XMPP/Jabber also still exists, if you're into that kind of stuff, but I don't think the "chat room" story is great there, at least not as good as Matrix
Basically, the decision tree is this:
  • if you want to stay on IRC:
    • if you are already on many OFTC channels and few freenode channels: move to OFTC
    • if you are more inclined to support the previous freenode staff: move to libera
    • if you care about matrix users (in the short term): move to OFTC
  • if you are ready to leave IRC:
    • if you want the latest and greatest: move to Matrix
    • if you like XML and already use XMPP: move to XMPP
Frankly, at this point, everyone should seriously consider moving to Matrix. The user story is great, the web is a first class user, it supports E2EE (although XMPP as well), and has a lot of momentum behind it. It even bridges with IRC well (which is not the case for XMPP) so if you're worried about problems like this happening again. (Indeed, I wouldn't be surprised if similar drama happens on OFTC or libera in the future. The history of IRC is full of such epic controversies, takeovers, sabotage, attacks, technical flamewars, and other silly things. I am not sure, but I suspect a federated model like Matrix might be more resilient to conflicts like this one.) Changing protocols might mean losing a bunch of users however: not everyone is ready to move to Matrix, for example. Graybeards like me have been using irssi for years, if not decades, and would take quite a bit of convincing to move elsewhere. I have mostly kept my channels on IRC, and moved either to OFTC or libera. In retrospect, I think I might have moved everything to OFTC if I would have thought about it more, because almost all of my channels are there. But I kind of expect a lot of the freenode community to move to libera, so I am keeping a socket open there anyways.

How do I move? The first thing you should do is to update documentation, websites, and source code to stop pointing at freenode altogether. This is what I did for feed2exec, for example. You need to let people know in the current channel as well, and possibly shutdown the channel on freenode. Since my channels are either small or empty, I took the radical approach of:
  • redirecting the channel to ##unavailable which is historically the way we show channels have moved to another network
  • make the channel invite-only (which effectively enforces the redirection)
  • kicking everyone out of the channel
  • kickban people who rejoin
  • set the topic to announce the change
In IRC speak, the following commands should do all this:
/msg ChanServ set #anarcat mlock +if ##unavailable
/msg ChanServ clear #anarcat users moving to irc.libera.chat
/msg ChanServ set #anarcat restricted on
/topic #anarcat this channel has moved to irc.libera.chat
If the channel is not registered, the following might work
/mode #anarcat +if ##unavailable
Then you can leave freenode altogether:
/disconnect Freenode unacceptable hijack, policy changes and takeovers. so long and thanks for all the fish.
Keep in mind that some people have been unable to setup such redirections, because the new freenode staff have taken over their channel, in which case you're out of luck... Some people have expressed concern about their private data hosted at freenode as well. If you care about this, you can always talk to NickServ and DROP your nick. Be warned, however, that this assumes good faith of the network operators, which, at this point, is kind of futile. I would assume any data you have registered on there (typically: your NickServ password and email address) to be compromised and leaked. If your password is used elsewhere (tsk, tsk), change it everywhere. Update: there's also another procedure, similar to the above, but with a different approach. Keep in mind that so-called freenode staff are actively hijacking channels for the mere act of mentioning libera in the channel topic, so thread carefully there.

Last words This is a sad time for IRC in general, and freenode in particular. It's a real shame that the previous freenode staff have been kicked out, and it's especially horrible that if the new policies of the network are basically making the network open to nazis. I wish things would have gone out differently: now we have yet another fork in the IRC history. While it's not the first time freenode changes name (it was called OPN before), now the old freenode is still around and this will bring much confusion to the world, especially since the new freenode staff is still claiming to support FOSS. I understand there are many sides to this story, and some people were deeply hurt by all this. But for me, it's completely unacceptable to keep pushing your staff so hard that they basically all (except one?) resign in protest. For me, that's leadership failure at the utmost, and a complete disgrace. And of course, I can't in good conscience support or join a network that allows hate speech. Regardless of the fate of whatever we'll call what's left of freenode, maybe it's time for this old IRC thing to die already. It's still a sad day in internet history, but then again, maybe IRC will never die...

19 March 2021

Antoine Beaupr : Securing my IRC (irssi, screen) session with dtach and systemd

A recent vulnerability in GNU screen caused some people to reconsider their commitment to the venerable terminal multiplexing program. GNU screen is probably used by thousands of old sysadmins around the world to run long-standing processes and, particularly, IRC sessions, which are especially vulnerable to arbitrary garbage coming on ... screen, so to speak. So this vulnerability matters, and you should definitely pay attention to it. If you haven't switched to tmux yet, now might be a good time to get your fingers trained. But don't switch to it just yet for your IRC session, and read on for a better, more secure solution. After all, it's not because we found this flaw in screen that it doesn't exist in tmux (or your favorite terminal emulator, for that matter, a much scarier thought).

Hardening my bouncer Back in March 2019, I had already switched away from screen for IRC, but not to tmux like many did, but to dtach. I figured that I didn't actually need multiplexing to run my long-running IRC session: I just needed to be able to reattach to the terminal. That's what dtach does. No windows, no panes, and, especially, no way to start a new shell, which is exactly the kind of hardening I need. So I came up with this, to start irssi:
dtach -N /run/$USER/dtach-irssi.socket irssi
To attach:
dtach -a /run/$USER/dtach-irssi.socket
Fairly simple no? Already one attack vector gone: evil attacker can't get a new shell through my terminal multiplexer, yay.

Splitting into another user But why stop there! Why am I running irssi as my main user anyways! Let's take the lessons from good UNIX security, and run this as a separate user altogether. This requires me to create another user, say foo-irc:
adduser foo-irc
... and run it as a systemd service, because how else are you going to start this thing anyways, cron? I came up with something like this unit file:
[Unit]
Description=IRC screen session
After=network.target
[Service]
Type=simple
Environment="TERM=screen.xterm-256color"
User=%i
RuntimeDirectory=%i
ExecStart=-/usr/bin/dtach -N /run/%i/dtach-irssi.socket irssi
ExecStop=-/bin/sh -c 'echo /quit stopping service...   exec /usr/bin/dtach -p /run/%i/dtach-irssi.socket'
ExecReload=-/bin/sh -c 'echo /restart   exec /usr/bin/dtach -p /run/%i/dtach-irssi.socket'
Notice this is a service template, because of the %i stuff. I don't actually remember how to enable this thing, but let's say you drop this in /etc/systemd/system/irssi@.service, then you run:
systemctl daemon-reload
And then, not sure about this bit, instantiate that template:
systemctl enable irssi@foo-irc.service
And then this should start the irssi session:
systemctl start irssi@foo-irc.service
To access the session:
sudo -u foo-irc dtach -a /run/foo-irc/dtach-irssi.socket
Obviously, you will probably need to migrate your irssi configuration over, otherwise you'll end up with a blank, old-school irssi. Take a moment to savor the view though. Nostalgia. Ah.

Hardening irssi But this is still not enough. That pesky foo-irc user can still launch arbitrary commands, thanks to irssi /exec (and a generous Perl scripting environment). Let's throw the entire kitchen sink at it and see what sticks. At this point, the unit file becomes too long to just maintain in a blog post (which would be silly, but not unheard of), so just look at this git repository instead.

Password-less remote irssi The neat thing with this hardening is that I now feel comfortable enough with the setup to just add a password-less SSH key to that (basically throwaway) account: worst that can happen if someone gets a hold of that SSH key is they land in a heavily sandboxed irssi session. So yay, no password to jump on chat. Like a real client or something. Just make sure to secure the SSH key you'll deploy in authorized_keys with:
restrict,pty,command="dtach -a /run/foo-irc/dtach-irssi.socket" [...]
Obviously, make sure the keys are not writable by the user, by placing it somewhere outside their home, which might require hacking at your server's SSH configuration. Because otherwise a compromised user will be able to change his own authorized_keys, which could be bad.

Configuration management And at this point, you may have noticed that you shouldn't actually followed my instructions to the letter. Instead, just use this neat little Puppet module which does all of the above, but also include some little wrapper so that mosh still works. It also includes instructions on how to setup your SSH keys. Enjoy, and let me know if (or rather, how) I messed up.

Updates
  1. it seems that dtach is not very active upstream: the last release (0.9) is from 2016, and the last commit (at the time of writing) is from 2017
  2. dtach is not necessarily safer than screen or tmux from arbitrary input from the outside, in fact there was a vulnerability on dtach CVE-2012-3368 that led to an attacker accessing stack memory (but maybe not code execution)
  3. after writing the Puppet module and publishing this article, I started to get weird behavior from dtach: i would leave the office at night and then return the next morning to find that I was timed out on servers. from my perspective, irssi noticed only when I re-attached the session:
    09:39:51 -!- Irssi: warning Broken pipe
    09:39:51 -!- Irssi: warning SSL write error: Broken pipe
    09:39:51 -!- Irssi: warning SSL write error: Broken pipe
    09:39:51 -!- Irssi: warning SSL write error: Broken pipe
    09:39:51 -!- Irssi: warning SSL write error: Broken pipe
    09:39:51 [bitlbee] -!- Irssi: Connection lost to localhost
    09:39:51 -!- Irssi: warning SSL write error: Broken pipe
    09:39:51 [gitter] -!- Irssi: Connection lost to irc.gitter.im
    09:39:51 [OFTC] -!- Irssi: Connection lost to irc.oftc.net
    09:42:34 [IMC] -!- Irssi: Connection lost to irc.indymedia.org
    09:42:34 -!- Irssi: Connection lost to irc.hackint.org
    09:42:34 -!- Irssi: Connection lost to chat.freenode.net
    09:42:34 -!- dtach_away: Set away
    
    from the outside I actually timed out a few minutes after I detached, which also makes for a weird asymmetry:
    22:31:00 -!- anarcat [~anarcat@ocean] has quit [Ping timeout: 250 seconds]
    
    that is eleven hours before the error I get.
  4. the mosh wrapper script seems to not work as well as it did before. somehow just running mosh $server hangs with a blank screen instead of instantly rejoining the session. I'm not sure it is related to the timeout problem but I did rewrite the wrapper before publication. this is the old version:
    #!/bin/sh
    # inspired by https://serverfault.com/questions/749474/ssh-authorized-keys-command-option-multiple-commands
    command="dtach -a /run/anarcat-irc/dtach-irssi.socket"
    case "$SSH_ORIGINAL_COMMAND" in
        mosh-server*)
        exec mosh-server -- $command
        ;;
        *)
        exec $command
        ;;
    esac
    
    I'm thinking of trying that one out for a while to see if it's related. The weirdest thing is that mosh "un-hangs" if i reattach with plain ssh, so there's definitely something fishy going on here.

6 January 2021

Urvika Gola: Dog tails and tales from 2020

There is no denying in the fact that 2020 was a challenging year for everybody, including animals. In India, animals such as dogs who mostly filled their bellies at street food stalls, were starving as there were no street eateries operating during a long long lockdown. I was in my home town, New Delhi, working from home like most of us. During the month of July 2020, a dog near the place I live (we fondly called her Brownie) delivered 7 pups in wilderness. I would never forget my first sight of them, inside a dirty, garbage filled land there were the cutest, cleanest, tiniest ball of fur! All of them were toppled as the land s surface was uneven.. The first instinct was to put them all together on a flat surface. After the search mission completed, this was the sight..
Brownie and her litter put together in the same land where she gave birth.
The next day, I sought help from a animal-lover person to build a temporary shed for the puppies! We came and changed sheets, cleaned the surroundings and put fresh water for Brownie until
..it started raining heavily one night and we were worried if the shed would sustain the heavy rainfall.
Next morning the first thing was to check on the pups, luckily, the pups were fine however, the entire area and their bed was damp. Without any second thought, the pups were moved from there to a safe house shelter as it was predicted that the rains will continue for a few more weeks due to monsoon. Soon, 2 months went by, from observing the pups crawl over, their eyes open and to their first bark, despite the struggles, it was an beautiful experience.
Brownie weaned off the pups and thus, they were ready for adoption! However, my biggest fear was, will anyone come forward to adopt them??

With such thoughts parallelly running in my mind, I started to post about adoption for these 7 pups.
To my biggest surprise, one by one, 5 amazing humans came forward and decided to give these pups a better life than what they would get on the streets of India. I wouldn t be able to express in words how grateful I will to be all the five dog parents who decided to adopt an Indian Street Puppy/Indies/Desi Puppy, opening up the space in their hearts and homes for the pups!

One of the 5 adopted pups is adopted by a person who hails from USA, but currently working in India. It s so heartwarming to see, that in India, despite so much awareness created against breeders and their methods, people still prefer to go for foreign bred puppy and disregard Indian/Desi Dogs.. On the other hand, there are foreigners who value the life of a Indian/Desi Dog : )

The 5 Adopted Pups who now have a permanent loving family!

The adorable, Robin !
Don and his new big brother!
The naughty and handsome, Swayze !
First Pup who got adopted Pluto
Playful and Beautiful, Bella !
If this isn t perfect, I don t know what is! God had planned loving families for them and they found it..
However, Its been almost six months now, that we haven t found a permanent home for 2 of the 7 pups, but they have the perfect foster family taking care of them right now.

UP FOR ADOPTION Delhi/NCR
Meet Momo and Beesa,
2 out of the 7 pups, who are still waiting for a forever home, currently living with a loving foster family.

Vaccinations, Deworming is done.
Female pups, 6 months old.

Now as winters are here, Along with one of my friend, who is also fostering the two pups, arranged gunny sack bags for our street, stray dogs. Two NGOs namely, Lotus Indie Foundation and We Exist Foundation who work Animal Welfare in India, were providing dog beds to ground volunteers like us. We are fortunate that they selected us and helped us to make winters less harsh for the stray dogs. However, the cold is such, I also purchased dog coats as well and put in on a few furries. After hours of running behind dogs and convincing them to wear coats, we managed to put it on a few.
Brownie, the mom dog!
This is a puppy!
She did not let us put coat on her
Another topic that needs more sensitivity is Sterilization/Neutering of dogs, that s a viable method cited by the Government to control dog population and end suffering of puppies who die under the wheels of cars. However, the implementation of this is worrisome, as it s not as robust. In a span of 6 months, I managed to get 5 dog sterilized in my area, number is not big but I feel it s a good start as an individual  When I see them now, healthier, happier, running around, with no fear of getting attacked by dogs, I can t express the content I feel. For 2 of the dogs (Brownie and her friend) I got it done personally from a private vet. For the other 3, I got it done via Municipal Corporations who do it for free for dogs, you d have to call them and they come with dog catchers and a van and drop them back in the same area, but volunteers like us have to be very vigilant and active during the whole process to follow up with them.
Dogs getting dropped off after sterilization.
My 2020 ended with this, I am not sure why I am I even writing this in my blog where mostly I focused on my technical work and experiences, but this pandemic was challenging was everybody and what we planned couldn t happen, but because of 2020, because of the pandemic, I was on WFH in my city and I was able to help a few dogs in my area have a healthy life ahead!

What I learned during this entire adventure was, there are a lot of sweet, sensitive, caring people that we are just yet to meet. Along the way, we will also meet insensitive and discouraging people, who are unwilling to change or listen, ignore them and continue your good.

Have 1 person by your side, it s so much stronger than 10 against you. Silver lining! Hope you all had some positive experiences despite the adversity faced by every single one of us.

1 January 2021

Keith Packard: kgames

Reviving Very Old X Code I've taken the week between Christmas and New Year's off this year. I didn't really have anything serious planned, just taking a break from the usual routine. As often happens, I got sucked into doing a project when I received this simple bug report Debian Bug #974011
I have been researching old terminal and X games recently, and realized
that much of the code from 'xmille' originated from the terminal game
'mille', which is part of bsdgames.
...
[The copyright and license information] has been stripped out of all
code in the xmille distribution.  Also, none of the included materials
give credit to the original author, Ken Arnold.
The reason the 'xmille' source is missing copyright and license information from the 'mille' files is that they were copied in before that information was added upstream. Xmille forked from Mille around 1987 or so. I wrote the UI parts for the system I had at the time, which was running X10R4. A very basic port to X11 was done at some point, and that's what Debian has in the archive today. At some point in the 90s, I ported Xmille to the Athena widget set, including several custom widgets in an Xaw extension library, Xkw. It's a lot better than the version in Debian, including displaying the cards correctly (the Debian version has some pretty bad color issues). Here's what the current Debian version looks like: Fixing The Bug To fix the missing copyright and license information, I imported the mille source code into the "latest" Xaw-based version. The updated mille code had a number of bug fixes and improvements, along with the copyright information. That should have been sufficient to resolve the issue and I could have constructed a suitable source package from whatever bits were needed and and uploaded that as a replacement 'xmille' package. However, at some later point, I had actually merged xmille into a larger package, 'kgames', which also included a number of other games, including Reversi, Dominoes, Cribbage and ten Solitaire/Patience variants. (as an aside, those last ten games formed the basis for my Patience Palm Pilot application, which seems to have inspired an Android App of the same name...) So began my yak shaving holiday. Building Kgames in 2020 Ok, so getting this old source code running should be easy, right? It's just a bunch of C code designed in the 80s and 90s to work on VAXen and their kin. How hard could it be?
  1. Everything was a 32-bit computer back then; pointers and ints were both 32 bits, so you could cast them with wild abandon and cause no problems. Today, testing revealed segfaults in some corners of the code.
  2. It's K&R C code. Remember that the first version of ANSI-C didn't come out until 1989, and it was years later that we could reliably expect to find an ANSI compiler with a random Unix box.
  3. It's X11 code. Fortunately (?), X11 hasn't changed since these applications were written, so at least that part still works just fine. Imagine trying to build Windows or Mac OS code from the early 90's on a modern OS...
I decided to dig in and add prototypes everywhere; that found a lot of pointer/int casting issues, as well as several lurking bugs where the code was just plain broken. After a day or so, I had things building and running and was no longer hitting crashes. Kgames 1.0 uploaded to Debian New Queue With that done, I decided I could at least upload the working bits to the Debian archive and close the bug reported above. kgames 1.0-2 may eventually get into unstable, presumably once the Debian FTP team realizes just how important fixing this bug is. Or something. Here's what xmille looks like in this version: And here's my favorite solitaire variant too: But They Look So Old Yeah, Xaw applications have a rustic appearance which may appeal to some, but for people with higher resolution monitors and well seasoned eyesight, squinting at the tiny images and text makes it difficult to enjoy these games today. How hard could it be to update them to use larger cards and scalable fonts? Xkw version 2.0 I decided to dig in and start hacking the code, starting by adding new widgets to the Xkw library that used cairo for drawing instead of core X calls. Fortunately, the needs of the games were pretty limited, so I only needed to implement a handful of widgets: The other Xkw widgets all got their rendering switched to using cairo, plus using double buffering to make updates look better. SVG Playing Cards Looking on wikimedia, I found a page referencing a large number of playing cards in SVG form That led me to Adrian Kennard's playing card web site that let me customize and download a deck of cards, licensed using the CC0 Public Domain license. With these cards, I set about rewriting the Xkw playing card widget, stripping out three different versions of bitmap playing cards and replacing them with just these new SVG versions. SVG Xmille Cards Ok, so getting regular playing cards was good, but the original goal was to update Xmille, and that has cards hand drawn by me. I could just use those images, import them into cairo and let it scale them to suit on the screen. I decided to experiment with inkscape's bitmap tracing code to see what it could do with them. First, I had to get them into a format that inkscape could parse. That turned out to be a bit tricky; the original format is as a set of X bitmap layers; each layer painting a single color. I ended up hacking the Xmille source code to generate the images using X, then fetching them with XGetImage and walking them to construct XPM format files which could then be fed into the portable bitmap tools to create PNG files that inkscape could handle. The resulting images have a certain charm: I did replace the text in the images to make it readable, otherwise these are untouched from what inkscape generated. The Results Remember that all of these are applications built using the venerable X toolkit; there are still some non-antialiased graphics visible as the shaped buttons use the X Shape extension. But, all rendering is now done with cairo, so it's all anti-aliased and all scalable. Here's what Xmille looks like after the upgrades: And here's spider: Once kgames 1.0 reaches Debian unstable, I'll upload these new versions.

3 December 2020

Dirk Eddelbuettel: RcppTOML 0.1.7: Support for g++-11, Minor Updates

A new RcppTOML release arrived on CRAN earlier today evening. RcppTOML brings TOML to R. TOML is a file format that is most suitable for configurations, as it is meant to be edited by humans but read by computers. It emphasizes strong readability for humans while at the same time supporting strong typing as well as immediate and clear error reports. On small typos you get parse errors, rather than silently corrupted garbage. Much preferable to any and all of XML, JSON or YAML though sadly these may be too ubiquitous now. TOML has been making inroads with projects such as the Hugo static blog compiler, or the Cargo system of Crates (aka packages ) for the Rust language. CRAN had sent us a note that the package no longer compiled under the [unreleased, of course, never change, BDR ;-) ] g++-11 compiler, but were kind enough to hint that it was only lacking an #include <limits>. These things happen: newer compilers are generally more strict, and that is generally a good things. (Last year this time we prepped code for the more stringent view on global variables under gcc-10. Earlier g++ version had similar demands to clarify include headers.) I set up a simple Docker contain with on Ubuntu 21.04 with g++-11, R, and Rcpp to build the package and make this change (which was of course also PR ed upstream at cpptoml), plus some other small ones that update the package since the last release roughly 18 months ago. We also switched CI use to the r-ci setup I should blog about a little more, removed a bashism and updated a few URLs. The bulleted list of changes in this version follows.

Changes in version 0.1.7 (2020-12-01)
  • Add #include <limits> to header file, also contributed upstream, to permit compilation under the (unreleased) g++-11.
  • Switch the simple cleanup script to sh.
  • Switch CI use to r-ci for focal and bspm.
  • Update several TOML URLs to https://toml.io/en/.

Courtesy of my CRANberries, there is a diffstat report for this release. More information is on the RcppTOML page page. Please use the GitHub issue tracker for issues and bugreports. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

14 June 2020

Enrico Zini: Culture links

Those of you who watch a lot of Hollywood movies may have noticed a certain trend that has consumed the industry in the last few years. It ...
Video Essay Catalog No. 91 by Kevin B. Lee. Featured on the New York Times and other outlets. Originally published December 13, 2011 on Fandor. https://carpetbagger.blogs.nytimes.com/2011/12/19/staring-in-awe-its-the-spielberg-face/?_r=0
The Korowai cannibals live on top of trees. But is it true?
Bandicoot Cabbagepatch, Bandersnatch Cumberbund, and even Wimbledon Tennismatch: there seem to be endless variations on the name of Benedict Cumberbatch. [...] But how is a normal internet citizen supposed to know, when they hear someone say I just can t stop looking at gifs of Bombadil Rivendell that this person isn t talking about some other actor with a name and a voice and cheekbones? Or in other words, what makes for a reasonable variation of the name Bendandsnap Calldispatch?

17 May 2020

Matthew Palmer: Private Key Redaction: UR DOIN IT RONG

Because posting private keys on the Internet is a bad idea, some people like to redact their private keys, so that it looks kinda-sorta like a private key, but it isn t actually giving away anything secret. Unfortunately, due to the way that private keys are represented, it is easy to redact a key in such a way that it doesn t actually redact anything at all. RSA private keys are particularly bad at this, but the problem can (potentially) apply to other keys as well. I ll show you a bit of Inside Baseball with key formats, and then demonstrate the practical implications. Finally, we ll go through a practical worked example from an actual not-really-redacted key I recently stumbled across in my travels.

The Private Lives of Private Keys Here is what a typical private key looks like, when you come across it:
-----BEGIN RSA PRIVATE KEY-----
MGICAQACEQCxjdTmecltJEz2PLMpS4BXAgMBAAECEDKtuwD17gpagnASq1zQTYEC
CQDVTYVsjjF7IQIJANUYZsIjRsR3AgkAkahDUXL0RSECCB78r2SnsJC9AghaOK3F
sKoELg==
-----END RSA PRIVATE KEY-----
Obviously, there s some hidden meaning in there computers don t encrypt things by shouting BEGIN RSA PRIVATE KEY! , after all. What is between the BEGIN/END lines above is, in fact, a base64-encoded DER format ASN.1 structure representing a PKCS#1 private key. In simple terms, it s a list of numbers very important numbers. The list of numbers is, in order:
  • A version number (0);
  • The public modulus , commonly referred to as n ;
  • The public exponent , or e (which is almost always 65,537, for various unimportant reasons);
  • The private exponent , or d ;
  • The two private primes , or p and q ;
  • Two exponents, which are known as dmp1 and dmq1 ; and
  • A coefficient, known as iqmp .

Why Is This a Problem? The thing is, only three of those numbers are actually required in a private key. The rest, whilst useful to allow the RSA encryption and decryption to be more efficient, aren t necessary. The three absolutely required values are e, p, and q. Of the other numbers, most of them are at least about the same size as each of p and q. So of the total data in an RSA key, less than a quarter of the data is required. Let me show you with the above toy key, by breaking it down piece by piece1:
  • MGI DER for this is a sequence
  • CAQ version (0)
  • CxjdTmecltJEz2PLMpS4BX n
  • AgMBAA e
  • ECEDKtuwD17gpagnASq1zQTY d
  • ECCQDVTYVsjjF7IQ p
  • IJANUYZsIjRsR3 q
  • AgkAkahDUXL0RS dmp1
  • ECCB78r2SnsJC9 dmq1
  • AghaOK3FsKoELg== iqmp
Remember that in order to reconstruct all of these values, all I need are e, p, and q and e is pretty much always 65,537. So I could redact almost all of this key, and still give all the important, private bits of this key. Let me show you:
-----BEGIN RSA PRIVATE KEY-----
..............................................................EC
CQDVTYVsjjF7IQIJANUYZsIjRsR3....................................
........
-----END RSA PRIVATE KEY-----
Now, I doubt that anyone is going to redact a key precisely like this but then again, this isn t a typical RSA key. They usually look a lot more like this:
-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEAu6Inch7+mWtKn+leB9uCG3MaJIxRyvC/5KTz2fR+h+GOhqj4
SZJobiVB4FrE5FgC7AnlH6qeRi9MI0s6dt5UWZ5oNIeWSaOOeNO+EJDUkSVf67wj
SNGXlSjGAkPZ0nRJiDjhuPvQmdW53hOaBLk5udxPEQbenpXAzbLJ7wH5ouLQ3nQw
HwpwDNQhF6zRO8WoscpDVThOAM+s4PS7EiK8ZR4hu2toon8Ynadlm95V45wR0VlW
zywgbkZCKa1IMrDCscB6CglQ10M3Xzya3iTzDtQxYMVqhDrA7uBYRxA0y1sER+Rb
yhEh03xz3AWemJVLCQuU06r+FABXJuY/QuAVvQIDAQABAoIBAFqwWVhzWqNUlFEO
PoCVvCEAVRZtK+tmyZj9kU87ORz8DCNR8A+/T/JM17ZUqO2lDGSBs9jGYpGRsr8s
USm69BIM2ljpX95fyzDjRu5C0jsFUYNi/7rmctmJR4s4uENcKV5J/++k5oI0Jw4L
c1ntHNWUgjK8m0UTJIlHbQq0bbAoFEcfdZxd3W+SzRG3jND3gifqKxBG04YDwloy
tu+bPV2jEih6p8tykew5OJwtJ3XsSZnqJMwcvDciVbwYNiJ6pUvGq6Z9kumOavm9
XU26m4cWipuK0URWbHWQA7SjbktqEpxsFrn5bYhJ9qXgLUh/I1+WhB2GEf3hQF5A
pDTN4oECgYEA7Kp6lE7ugFBDC09sKAhoQWrVSiFpZG4Z1gsL9z5YmZU/vZf0Su0n
9J2/k5B1GghvSwkTqpDZLXgNz8eIX0WCsS1xpzOuORSNvS1DWuzyATIG2cExuRiB
jYWIJUeCpa5p2PdlZmBrnD/hJ4oNk4oAVpf+HisfDSN7HBpN+TJfcAUCgYEAyvY7
Y4hQfHIdcfF3A9eeCGazIYbwVyfoGu70S/BZb2NoNEPymqsz7NOfwZQkL4O7R3Wl
Rm0vrWT8T5ykEUgT+2ruZVXYSQCKUOl18acbAy0eZ81wGBljZc9VWBrP1rHviVWd
OVDRZNjz6nd6ZMrJvxRa24TvxZbJMmO1cgSW1FkCgYAoWBd1WM9HiGclcnCZknVT
UYbykCeLO0mkN1Xe2/32kH7BLzox26PIC2wxF5seyPlP7Ugw92hOW/zewsD4nLze
v0R0oFa+3EYdTa4BvgqzMXgBfvGfABJ1saG32SzoWYcpuWLLxPwTMsCLIPmXgRr1
qAtl0SwF7Vp7O/C23mNukQKBgB89DOEB7xloWv3Zo27U9f7nB7UmVsGjY8cZdkJl
6O4LB9PbjXCe3ywZWmJqEbO6e83A3sJbNdZjT65VNq9uP50X1T+FmfeKfL99X2jl
RnQTsrVZWmJrLfBSnBkmb0zlMDAcHEnhFYmHFuvEnfL7f1fIoz9cU6c+0RLPY/L7
n9dpAoGAXih17mcmtnV+Ce+lBWzGWw9P4kVDSIxzGxd8gprrGKLa3Q9VuOrLdt58
++UzNUaBN6VYAe4jgxGfZfh+IaSlMouwOjDgE/qzgY8QsjBubzmABR/KWCYiRqkj
qpWCgo1FC1Gn94gh/+dW2Q8+NjYtXWNqQcjRP4AKTBnPktEvdMA=
-----END RSA PRIVATE KEY-----
People typically redact keys by deleting whole lines, and usually replacing them with [...] and the like. But only about 345 of those 1588 characters (excluding the header and footer) are required to construct the entire key. You can redact about 4/5ths of that giant blob of stuff, and your private parts (or at least, those of your key) are still left uncomfortably exposed.

But Wait! There s More! Remember how I said that everything in the key other than e, p, and q could be derived from those three numbers? Let s talk about one of those numbers: n. This is known as the public modulus (because, along with e, it is also present in the public key). It is very easy to calculate: n = p * q. It is also very early in the key (the second number, in fact). Since n = p * q, it follows that q = n / p. Thus, as long as the key is intact up to p, you can derive q by simple division.

Real World Redaction At this point, I d like to introduce an acquaintance of mine: Mr. Johan Finn. He is the proud owner of the GitHub repo johanfinn/scripts. For a while, his repo contained a script that contained a poorly-redacted private key. He since deleted it, by making a new commit, but of course because git never really deletes anything, it s still available. Of course, Mr. Finn may delete the repo, or force-push a new history without that commit, so here is the redacted private key, with a bit of the surrounding shell script, for our illustrative pleasure:
#Add private key to .ssh folder
cd /home/johan/.ssh/
echo  "-----BEGIN RSA PRIVATE KEY-----
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
 
MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::.::
:::::::::::::::::::::::::::.::::::::::::::::::::::::::::::::::::
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLlL
 
 
 
YYYYYYYYYYYYYYYYYYYYYyYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
-----END RSA PRIVATE KEY-----" >> id_rsa
Now, if you try to reconstruct this key by removing the obvious garbage lines (the ones that are all repeated characters, some of which aren t even valid base64 characters), it still isn t a key at least, openssl pkey doesn t want anything to do with it. The key is very much still in there, though, as we shall soon see. Using a gem I wrote and a quick bit of Ruby, we can extract a complete private key. The irb session looks something like this:
>> require "derparse"
>> b64 = <<EOF
MIIJKgIBAAKCAgEAxEVih1JGb8gu/Fm4AZh+ZwJw/pjzzliWrg4mICFt1g7SmIE2
TCQMKABdwd11wOFKCPc/UzRH/fHuQcvWrpbOSdqev/zKff9iedKw/YygkMeIRaXB
fYELqvUAOJ8PPfDm70st9GJRhjGgo5+L3cJB2gfgeiDNHzaFvapRSU0oMGQX+kI9
ezsjDAn+0Pp+r3h/u1QpLSH4moRFGF4omNydI+3iTGB98/EzuNhRBHRNq4oBV5SG
Pq/A1bem2ninnoEaQ+OPESxYzDz3Jy9jV0W/6LvtJ844m+XX69H5fqq5dy55z6DW
sGKn78ULPVZPsYH5Y7C+CM6GAn4nYCpau0t52sqsY5epXdeYx4Dc+Wm0CjXrUDEe
Egl4loPKDxJkQqQ/MQiz6Le/UK9vEmnWn1TRXK3ekzNV4NgDfJANBQobOpwt8WVB
rbsC0ON7n680RQnl7PltK9P1AQW5vHsahkoixk/BhcwhkrkZGyDIl9g8Q/Euyoq3
eivKPLz7/rhDE7C1BzFy7v8AjC3w7i9QeHcWOZFAXo5hiDasIAkljDOsdfD4tP5/
wSO6E6pjL3kJ+RH2FCHd7ciQb+IcuXbku64ln8gab4p8jLa/mcMI+V3eWYnZ82Yu
axsa85hAe4wb60cp/rCJo7ihhDTTvGooqtTisOv2nSvCYpcW9qbL6cGjAXECAwEA
AQKCAgEAjz6wnWDP5Y9ts2FrqUZ5ooamnzpUXlpLhrbu3m5ncl4ZF5LfH+QDN0Kl
KvONmHsUhJynC/vROybSJBU4Fu4bms1DJY3C39h/L7g00qhLG7901pgWMpn3QQtU
4P49qpBii20MGhuTsmQQALtV4kB/vTgYfinoawpo67cdYmk8lqzGzzB/HKxZdNTq
s+zOfxRr7PWMo9LyVRuKLjGyYXZJ/coFaobWBi8Y96Rw5NZZRYQQXLIalC/Dhndm
AHckpstEtx2i8f6yxEUOgPvV/gD7Akn92RpqOGW0g/kYpXjGqZQy9PVHGy61sInY
HSkcOspIkJiS6WyJY9JcvJPM6ns4b84GE9qoUlWVF3RWJk1dqYCw5hz4U8LFyxsF
R6WhYiImvjxBLpab55rSqbGkzjI2z+ucDZyl1gqIv9U6qceVsgRyuqdfVN4deU22
LzO5IEDhnGdFqg9KQY7u8zm686Ejs64T1sh0y4GOmGsSg+P6nsqkdlXH8C+Cf03F
lqPFg8WQC7ojl/S8dPmkT5tcJh3BPwIWuvbtVjFOGQc8x0lb+NwK8h2Nsn6LNazS
0H90adh/IyYX4sBMokrpxAi+gMAWiyJHIHLeH2itNKtAQd3qQowbrWNswJSgJzsT
JuJ7uqRKAFkE6nCeAkuj/6KHHMPsfCAffVdyGaWqhoxmPOrnVgECggEBAOrCCwiC
XxwUgjOfOKx68siFJLfHf4vPo42LZOkAQq5aUmcWHbJVXmoxLYSczyAROopY0wd6
Dx8rqnpO7OtZsdJMeBSHbMVKoBZ77hiCQlrljcj12moFaEAButLCdZFsZW4zF/sx
kWIAaPH9vc4MvHHyvyNoB3yQRdevu57X7xGf9UxWuPil/jvdbt9toaraUT6rUBWU
GYPNKaLFsQzKsFWAzp5RGpASkhuiBJ0Qx3cfLyirjrKqTipe3o3gh/5RSHQ6VAhz
gdUG7WszNWk8FDCL6RTWzPOrbUyJo/wz1kblsL3vhV7ldEKFHeEjsDGroW2VUFlS
asAHNvM4/uYcOSECggEBANYH0427qZtLVuL97htXW9kCAT75xbMwgRskAH4nJDlZ
IggDErmzBhtrHgR+9X09iL47jr7dUcrVNPHzK/WXALFSKzXhkG/yAgmt3r14WgJ6
5y7010LlPFrzaNEyO/S4ISuBLt4cinjJsrFpoo0WI8jXeM5ddG6ncxdurKXMymY7
EOF
>> b64 += <<EOF
gff0GJCOMZ65pMSy3A3cSAtjlKnb4fWzuHD5CFbusN4WhCT/tNxGNSpzvxd8GIDs
nY7exs9L230oCCpedVgcbayHCbkChEfoPzL1e1jXjgCwCTgt8GjeEFqc1gXNEaUn
O8AJ4VlR8fRszHm6yR0ZUBdY7UJddxQiYOzt0S1RLlECggEAbdcs4mZdqf3OjejJ
06oTPs9NRtAJVZlppSi7pmmAyaNpOuKWMoLPElDAQ3Q7VX26LlExLCZoPOVpdqDH
KbdmBEfTR4e11Pn9vYdu9/i6o10U4hpmf4TYKlqk10g1Sj21l8JATj/7Diey8scO
sAI1iftSg3aBSj8W7rxCxSezrENzuqw5D95a/he1cMUTB6XuravqZK5O4eR0vrxR
AvMzXk5OXrUEALUvt84u6m6XZZ0pq5XZxq74s8p/x1JvTwcpJ3jDKNEixlHfdHEZ
ZIu/xpcwD5gRfVGQamdcWvzGHZYLBFO1y5kAtL8kI9tW7WaouWVLmv99AyxdAaCB
Y5mBAQKCAQEAzU7AnorPzYndlOzkxRFtp6MGsvRBsvvqPLCyUFEXrHNV872O7tdO
GmsMZl+q+TJXw7O54FjJJvqSSS1sk68AGRirHop7VQce8U36BmI2ZX6j2SVAgIkI
9m3btCCt5rfiCatn2+Qg6HECmrCsHw6H0RbwaXS4RZUXD/k4X+sslBitOb7K+Y+N
Bacq6QxxjlIqQdKKPs4P2PNHEAey+kEJJGEQ7bTkNxCZ21kgi1Sc5L8U/IGy0BMC
PvJxssLdaWILyp3Ws8Q4RAoC5c0ZP0W2j+5NSbi3jsDFi0Y6/2GRdY1HAZX4twem
Q0NCedq1JNatP1gsb6bcnVHFDEGsj/35oQKCAQEAgmWMuSrojR/fjJzvke6Wvbox
FRnPk+6YRzuYhAP/YPxSRYyB5at++5Q1qr7QWn7NFozFIVFFT8CBU36ktWQ39MGm
cJ5SGyN9nAbbuWA6e+/u059R7QL+6f64xHRAGyLT3gOb1G0N6h7VqFT25q5Tq0rc
Lf/CvLKoudjv+sQ5GKBPT18+zxmwJ8YUWAsXUyrqoFWY/Tvo5yLxaC0W2gh3+Ppi
EDqe4RRJ3VKuKfZxHn5VLxgtBFN96Gy0+Htm5tiMKOZMYAkHiL+vrVZAX0hIEuRZ
EOF
>> der = b64.unpack("m").first
>> c = DerParse.new(der).first_node.first_child
>> version = c.value
=> 0
>> c = c.next_node
>> n = c.value
=> 80071596234464993385068908004931... # (etc)
>> c = c.next_node
>> e = c.value
=> 65537
>> c = c.next_node
>> d = c.value
=> 58438813486895877116761996105770... # (etc)
>> c = c.next_node
>> p = c.value
=> 29635449580247160226960937109864... # (etc)
>> c = c.next_node
>> q = c.value
=> 27018856595256414771163410576410... # (etc)
What I ve done, in case you don t speak Ruby, is take the two chunks of plausible-looking base64 data, chuck them together into a variable named b64, unbase64 it into a variable named der, pass that into a new DerParse instance, and then walk the DER value tree until I got all the values I need. Interestingly, the q value actually traverses the split in the two chunks, which means that there s always the possibility that there are lines missing from the key. However, since p and q are supposed to be prime, we can sanity check them to see if corruption is likely to have occurred:
>> require "openssl"
>> OpenSSL::BN.new(p).prime?
=> true
>> OpenSSL::BN.new(q).prime?
=> true
Excellent! The chances of a corrupted file producing valid-but-incorrect prime numbers isn t huge, so we can be fairly confident that we ve got the real p and q. Now, with the help of another one of my creations we can use e, p, and q to create a fully-operational battle key:
>> require "openssl/pkey/rsa"
>> k = OpenSSL::PKey::RSA.from_factors(p, q, e)
=> #<OpenSSL::PKey::RSA:0x0000559d5903cd38>
>> k.valid?
=> true
>> k.verify(OpenSSL::Digest::SHA256.new, k.sign(OpenSSL::Digest::SHA256.new, "bob"), "bob")
=> true
and there you have it. One fairly redacted-looking private key brought back to life by maths and far too much free time. Sorry Mr. Finn, I hope you re not still using that key on anything Internet-facing.

What About Other Key Types? EC keys are very different beasts, but they have much the same problems as RSA keys. A typical EC key contains both private and public data, and the public portion is twice the size so only about 1/3 of the data in the key is private material. It is quite plausible that you can redact an EC key and leave all the actually private bits exposed.

What Do We Do About It? In short: don t ever try and redact real private keys. For documentation purposes, just put KEY GOES HERE in the appropriate spot, or something like that. Store your secrets somewhere that isn t a public (or even private!) git repo. Generating a dummy private key and sticking it in there isn t a great idea, for different reasons: people have this odd habit of reusing demo keys in real life. There s no need to encourage that sort of thing.
  1. Technically the pieces aren t 100% aligned with the underlying DER, because of how base64 works. I felt it was easier to understand if I stuck to chopping up the base64, rather than decoding into DER and then chopping up the DER.

18 March 2020

Antoine Beaupr : How can I trust this git repository?

Join me in the rabbit hole of git repository verification, and how we could improve it.

Problem statement As part of my work on automating install procedures at Tor, I ended up doing things like:
git clone REPO
./REPO/bootstrap.sh
... something eerily similar to the infamous curl pipe bash method which I often decry. As a short-term workaround, I relied on the SHA-1 checksum of the repository to make sure I have the right code, by running this both on a "trusted" (ie. "local") repository and the remote, then visually comparing the output:
$ git show-ref master
9f9a9d70dd1f1e84dec69a12ebc536c1f05aed1c refs/heads/master
One problem with this approach is that SHA-1 is now considered as flawed as MD5 so it can't be used as an authentication mechanism anymore. It's also fundamentally difficult to compare hashes for humans. The other flaw with comparing local and remote checksums is that we assume we trust the local repository. But how can I trust that repository? I can either:
  1. audit all the code present and all the changes done to it after
  2. or trust someone else to do so
The first option here is not practical in most cases. In this specific use case, I have audited the source code -- I'm the author, even -- what I need is to transfer that code over to another server. (Note that I am replacing those procedures with Fabric, which makes this use case moot for now as the trust path narrows to "trust the SSH server" which I already had anyways. But it's still important for my fellow Tor developers who worry about trusting the git server, especially now that we're moving to GitLab.) But anyways, in most cases, I do need to trust some other fellow developer I collaborate with. To do this, I would need to trust the entire chain between me and them:
  1. the git client
  2. the operating system
  3. the hardware
  4. the network (HTTPS and the CA cartel, specifically)
  5. then the hosting provider (and that hardware/software stack)
  6. and then backwards all the way back to that other person's computer
I want to shorten that chain as much as possible, make it "peer to peer", so to speak. Concretely, it would eliminate the hosting provider and the network, as attackers.

OpenPGP verification My first reaction is (perhaps perversely) to "use OpenPGP" for this. I figured that if I sign every commit, then I can just check the latest commit and see if the signature is good. The first problem here is that this is surprisingly hard. Let's pick some arbitrary commit I did recently:
commit b3c538898b0ed4e31da27fc9ca22cb55e1de0000
Author: Antoine Beaupr  <anarcat@debian.org>
Date:   Mon Mar 16 14:37:28 2020 -0400
    fix test autoloading
    pytest only looks for file names matching  test  by default. We inline
    tests inside the source code directly, so hijack that.
diff --git a/fabric_tpa/pytest.ini b/fabric_tpa/pytest.ini
new file mode 100644
index 0000000..71004ea
--- /dev/null
+++ b/fabric_tpa/pytest.ini
@@ -0,0 +1,3 @@
+[pytest]
+# we inline tests directly in the source code
+python_files = *.py
That's the output of git log -p in my local repository. I signed that commit, yet git log is not telling me anything special. To check the signature, I need something special: --show-signature, which looks like this:
commit b3c538898b0ed4e31da27fc9ca22cb55e1de0000
gpg: Signature faite le lun 16 mar 2020 14:37:53 EDT
gpg:                avec la clef RSA 7B164204D096723B019635AB3EA1DDDDB261D97B
gpg: Bonne signature de  Antoine Beaupr  <anarcat@orangeseeds.org>  [ultime]
gpg:                 alias  Antoine Beaupr  <anarcat@torproject.org>  [ultime]
gpg:                 alias  Antoine Beaupr  <anarcat@anarc.at>  [ultime]
gpg:                 alias  Antoine Beaupr  <anarcat@koumbit.org>  [ultime]
gpg:                 alias  Antoine Beaupr  <anarcat@debian.org>  [ultime]
Author: Antoine Beaupr  <anarcat@debian.org>
Date:   Mon Mar 16 14:37:28 2020 -0400
    fix test autoloading
    pytest only looks for file names matching  test  by default. We inline
    tests inside the source code directly, so hijack that.
Can you tell if this is a valid signature? If you speak a little french, maybe you can! But even if you would, you are unlikely to see that output on your own computer. What you would see instead is:
commit b3c538898b0ed4e31da27fc9ca22cb55e1de0000
gpg: Signature made Mon Mar 16 14:37:53 2020 EDT
gpg:                using RSA key 7B164204D096723B019635AB3EA1DDDDB261D97B
gpg: Can't check signature: No public key
Author: Antoine Beaupr  <anarcat@debian.org>
Date:   Mon Mar 16 14:37:28 2020 -0400
    fix test autoloading
    pytest only looks for file names matching  test  by default. We inline
    tests inside the source code directly, so hijack that.
Important part: Can't check signature: No public key. No public key. Because of course you would see that. Why would you have my key lying around, unless you're me. Or, to put it another way, why would that server I'm installing from scratch have a copy of my OpenPGP certificate? Because I'm a Debian developer, my key is actually part of the 800 keys in the debian-keyring package, signed by the APT repositories. So I have a trust path. But that won't work for someone who is not a Debian developer. It will also stop working when my key expires in that repository, as it already has on Debian buster (current stable). So I can't assume I have a trust path there either. One could work with a trusted keyring like we do in the Tor and Debian project, and only work inside that project, that said. But I still feel uncomfortable with those commands. Both git log and git show will happily succeed (return code 0 in the shell) even though the signature verification failed on the commits. Same with git pull and git merge, which will happily push your branch ahead even if the remote has unsigned or badly signed commits. To actually verify commits (or tags), you need the git verify-commit (or git verify-tag) command, which seems to do the right thing:
$ LANG=C.UTF-8 git verify-commit b3c538898b0ed4e31da27fc9ca22cb55e1de0000
gpg: Signature made Mon Mar 16 14:37:53 2020 EDT
gpg:                using RSA key 7B164204D096723B019635AB3EA1DDDDB261D97B
gpg: Can't check signature: No public key
[1]$
At least it fails with some error code (1, above). But it's not flexible: I can't use it to verify that a "trusted" developer (say one that is in a trusted keyring) signed a given commit. Also, it is not clear what a failure means. Is a signature by an expired certificate okay? What if the key is signed by some random key in my personal keyring? Why should that be trusted?

Worrying about git and GnuPG In general, I'm worried about git's implementation of OpenPGP signatures. There has been numerous cases of interoperability problems with GnuPG specifically that led to security, like EFAIL or SigSpoof. It would be surprising if such a vulnerability did not exist in git. Even if git did everything "just right" (which I have myself found impossible to do when writing code that talks with GnuPG), what does it actually verify? The commit's SHA-1 checksum? The tree's checksum? The entire archive as a zip file? I would bet it signs the commit's SHA-1 sum, but I just don't know, on the top of my head, and neither do git-commit or git-verify-commit say exactly what is happening. I had an interesting conversation with a fellow Debian developer (dkg) about this and we had to admit those limitations:
<anarcat> i'd like to integrate pgp signing into tor's coding practices more, but so far, my approach has been "sign commits" and the verify step was "TBD" <dkg> that's the main reason i've been reluctant to sign git commits. i haven't heard anyone offer a better subsequent step. if torproject could outline something useful, then i'd be less averse to the practice. i'm also pretty sad that git remains stuck on sha1, esp. given the recent demonstrations. all the fancy strong signatures you can make in git won't matter if the underlying git repo gets changed out from under the signature due to sha1's weakness
In other words, even if git implements the arcane GnuPG dialect just so, and would allow us to setup the trust chain just right, and would give us meaningful and workable error messages, it still would fail because it's still stuck in SHA-1. There is work underway to fix that, but in February 2020, Jonathan Corbet described that work as being in a "relatively unstable state", which is hardly something I would like to trust to verify code. Also, when you clone a fresh new repository, you might get an entirely different repository, with a different root and set of commits. The concept of "validity" of a commit, in itself, is hard to establish in this case, because an hostile server could put you backwards in time, on a different branch, or even on an entirely different repository. Git will warn you about a different repository root with warning: no common commits but that's easy to miss. And complete branch switches, rebases and resets from upstream are hardly more noticeable: only a tiny plus sign (+) instead of a star (*) will tell you that a reset happened, along with a warning (forced update) on the same line. Miss those and your git history can be compromised.

Possible ways forward I don't consider the current implementation of OpenPGP signatures in git to be sufficient. Maybe, eventually, it will mature away from SHA-1 and the interface will be more reasonable, but I don't see that happening in the short term. So what do we do?

git evtag The git-evtag extension is a replacement for git tag -s. It's not designed to sign commits (it only verifies tags) but at least it uses a stronger algorithm (SHA-512) to checksum the tree, and will include everything in that tree, including blobs. If that sounds expensive to you, don't worry too much: it takes about 5 seconds to tag the Linux kernel, according to the author. Unfortunately, that checksum is then signed with GnuPG, in a manner similar to git itself, in that it exposes GnuPG output (which can be confusing) and is likely similarly vulnerable to mis-implementation of the GnuPG dialect as git itself. It also does not allow you to specify a keyring to verify against, so you need to trust GnuPG to make sense of the garbage that lives in your personal keyring (and, trust me, it doesn't). And besides, git-evtag is fundamentally the same as signed git tags: checksum everything and sign with GnuPG. The difference is it uses SHA-512 instead of SHA-1, but that's something git will eventually fix itself anyways.

kernel patch attestations The kernel also faces this problem. Linus Torvalds signs the releases with GnuPG, but patches fly all over mailing list without any form of verification apart from clear-text email. So Konstantin Ryabitsev has proposed a new protocol to sign git patches which uses SHA256 to checksum the patch metadata, commit message and the patch itself, and then sign that with GnuPG. It's unclear to me what this solves, if anything, at all. As dkg argues, it would seem better to add OpenPGP support to git-send-email and teach git tools to recognize that (e.g. git-am) at least if you're going to keep using OpenPGP anyways. And furthermore, it doesn't resolve the problems associated with verifying a full archive either, as it only attests "patches".

jcat Unhappy with the current state of affairs, the author of fwupd (Richard Hughes) wrote his own protocol as well, called jcat, which provides signed "catalog files" similar to the ones provided in Microsoft windows. It consists of a "gzip-compressed JSON catalog files, which can be used to store GPG, PKCS-7 and SHA-256 checksums for each file". So yes, it is yet again another wrapper to GnuPG, probably with all the flaws detailed above, on top of being a niche implementation, disconnected from git.

The Update Framework One more thing dkg correctly identified is:
<dkg> anarcat: even if you could do exactly what you describe, there are still some interesting wrinkles that i think would be problems for you. the big one: "git repo's latest commits" is a loophole big enough to drive a truck through. if your adversary controls that repo, then they get to decide which commits to include in the repo. (since every git repo is a view into the same git repo, just some have more commits than others)
In other words, unless you have a repository that has frequent commits (either because of activity or by a bot generating fake commits), you have to rely on the central server to decide what "the latest version" is. This is the kind of problems that binary package distribution systems like APT and TUF solve correctly. Unfortunately, those don't apply to source code distribution, at least not in git form: TUF only deals with "repositories" and binary packages, and APT only deals with binary packages and source tarballs. That said, there's actually no reason why git could not support the TUF specification. Maybe TUF could be the solution to ensure end-to-end cryptographic integrity of the source code itself. OpenPGP-signed tarballs are nice, and signed git tags can be useful, but from my experience, a lot of OpenPGP (or, more accurately, GnuPG) derived tools are brittle and do not offer clear guarantees, and definitely not to the level that TUF tries to address. This would require changes on the git servers and clients, but I think it would be worth it.

Other Projects

OpenBSD There are other tools trying to do parts of what GnuPG is doing, for example minisign and OpenBSD's signify. But they do not integrate with git at all right now. Although I did find a hack] to use signify with git, it's kind of gross...

Golang Unsurprisingly, this is a problem everyone is trying to solve. Golang is planning on hosting a notary which would leverage a "certificate-transparency-style tamper-proof log" which would be ran by Google (see the spec for details). But that doesn't resolve the "evil server" attack, if we treat Google as an adversary (and we should).

Python Python had OpenPGP going for a while on PyPI, but it's unclear if it ever did anything at all. Now the plan seems to be to use TUF but my hunch is that the complexity of the specification is keeping that from moving ahead.

Docker Docker and the container ecosystem has, in theory, moved to TUF in the form of Notary, "a project that allows anyone to have trust over arbitrary collections of data". In practice however, in my somewhat limited experience, setting up TUF and image verification in Docker is far from trivial.

Android and iOS Even in what is possibly one of the strongest models (at least in terms of user friendliness), mobile phones are surprisingly unclear about those kind of questions. I had to ask if Android had end-to-end authentication and I am still not clear on the answer. I have no idea of what iOS does.

Conclusion One of the core problems with everything here is the common usability aspect of cryptography, and specifically the usability of verification procedures. We have become pretty good at encryption. The harder part (and a requirement for proper encryption) is verification. It seems that problem still remains unsolved, in terms of usability. Even Signal, widely considered to be a success in terms of adoption and usability, doesn't properly solve that problem, as users regularly ignore "The security number has changed" warnings... So, even though they deserve a lot of credit in other areas, it seems unlikely that hardcore C hackers (e.g. git and kernel developers) will be able to resolve that problem without at least a little bit of help. And TUF seems like the state of the art specification around here, it would seem wise to start adopting it in the git community as well. Update: git 2.26 introduced a new gpg.minTrustLevel to "tell various signature verification codepaths the required minimum trust level", presumably to control how Git will treat keys in your keyrings, assuming the "trust database" is valid and up to date. For an interesting narrative of how "normal" (without PGP) git verification can fail, see also A Git Horror Story: Repository Integrity With Signed Commits.

2 March 2020

Antoine Beaupr : Moving dconf entries to git

I've been managing my UNIX $HOME repository with version control for almost two decades now (first under CVS, then with git). Once in a while, I find a little hack to make this work better. Today, it's dconf/gsettings, or more specifically, Workrave that I want to put in git. I noticed my laptop was extremely annoying compared with my office workstation and realized I never figured out how to write Workrave's configuration to git. The reason is that its configuration is stored in dconf, a binary database format, but, blissfully, I had forgotten about this and tried to figure out where the heck its config was. I was about to give up when I re-remembered this, when I figured I would just do a quick search ("dconf commit to git") and that brought me to Josh Triplett's Debconf 14 talk about this exact topic. The slides are... a little terse, but I could figure out the gist of it. The key is to change the DCONF_PROFILE environment to point to a new config file (say in your .bashrc):
export DCONF_PROFILE=$HOME/.config/dconf/profile
That file (~/.config/dconf/profile) should be created with the following content:
user-db:user
service-db:keyfile/user
  1. The first line is the default: store everything in this huge binary file.
  2. The second is the magic: it stores configuration in a precious text file, in .config/dconf/user.txt specifically.
Then the last step was to migrate config between the two. For that I need a third config file, a DCONF_PROFILE that has only the text database so settings are forcibly written there, say ~/.config/dconf/profile-edit:
service-db:keyfile/user
And then I can migrate my workrave configuration with:
gsettings list-recursively org.workrave   while read schema key val ; do DCONF_PROFILE=~/.config/dconf/profile-edit gsettings set "$schema" "$key" "$val" ; done
Of course, a bunch of those settings are garbage and do not differ from the default. Unfortuantely, there doesn't seem to be a way to tell gsettings to only list non-default settings, so I had to do things by hand from there, by comparing my generated config with:
DCONF_PROFILE=/dev/null gsettings list-recursively org.workrave
I finally ended up with the following user.txt, which is now my workrave config:
[org/workrave/timers/daily-limit]
snooze=1800
limit=25200
[org/workrave/timers/micro-pause]
auto-reset=10
snooze=150
limit=900
[org/workrave/timers/rest-break]
auto-reset=600
snooze=1800
limit=10800
[org/workrave/breaks/micro-pause]
max-preludes=0
That's a nice setup: "YOLO" settings end up in the binary database that I don't care about tracking in git, and a precious few snowflakes get tracked in a text file. Triplett also made a command to change settings in the text file, but I don't think I need to bother with that. It's more critical to copy settings between the two, in my experience, as I rarely have this moment: "oh I know exactly the key setting I want to change and i'll write it down". What actually happens is that I'm making a change in a GUI and later realize it should be synchronized over. (It looks like Triplett does have a tool to do those diffs and transitions, but unfortunately git://joshtriplett.org/git/home doesn't respond at this time so I can't find that source code.) (Update: that git repository can be downloaded with:
git clone https://joshtriplett.org/git/home
It looks like dconfc is designed to copy settings between the two databases. So my above loop would be more easily written as:
dconf dump /org/workrave/   DCONF_PROFILE=~/.config/dconf/profile-edit dconf load /org/workrave/
Certainly simplifies things. dconfd is also nice although it relies on the above dconfe script which makes it (IMHO) needlessly hard to deploy.) Now if only Firefox bookmarks were reasonable again...

2 November 2017

Steinar H. Gunderson: Introducing Narabu, part 5: Encoding

Narabu is a new intraframe video codec. You probably want to read part 1, part 2, part 3 and part 4 first. At this point, we've basically caught up with where I am, so thing are less set in stone. However, let's look at what qualitatively differentiates encoding from decoding; unlike in interframe video codecs (where you need to do motion vector search and stuff), encoding and decoding are very much mirror images of each other, so we can intuitively expect them to be relatively similar in performance. The procedure is DCT, quantization, entropy coding, and that's it. One important difference is in the entropy coding. Since our rANS encoding is non-adaptive (a choice made largely for simplicity, but also because our streams are so short) works by first signaling a distribution and then encoding each coefficient using that distribution. However, we don't know that distribution until we've DCT-ed all blocks in the picture, so we can't just DCT each block and entropy code the coefficients on-the-fly. There are a few ways to deal with this: As you can see, tons of possible strategies here. For simplicity, I've ended up with the former, although this could very well be changed at some point. There are some interesting subproblems, though: First of all, we need to decide the data type of this temporary array. The DCT tends to concentrate energy into fewer coefficients (which is a great property for compression!), so even after quantization, some of them will get quite large. This means we cannot store them in an 8-bit texture; however, even the bigger ones are very rarely bigger than 10 bits (including a sign bit), so using 16-bit textures wastes precious memory bandwidth. I ended up slicing the coefficients by horizontal index and then pairing them up (so that we generate pairs 0+7, 1+6, 2+5 and 3+4 for the first line of the 8x8 DCT block, 8+15, 9+14, 10+13 and 11+12 for the next line, etc.). This allows us to pack two coefficients into a 16-bit texture, for an average of 8 bits per coefficient, which is what we want. It makes for some slightly fiddly clamping and bit-packing since we are packing signed values, but nothing really bad. Second, and perhaps surprisingly enough, counting efficiently is nontrivial. We want a histogram over what coefficients are used the more often, ie., for each coefficient, we want something like ++counts[dist][coeff] (recall we have four distinct distributions). However, since we're in a massively parallel algorithm, this add needs to be atomic, and since values like e.g. 0 are super-common, all of our GPU cores will end up fighting over the cache line containing counts[dist][0]. This is not fast. Think 10 ms/frame not fast. Local memory to the rescue again; all modern GPUs have fast atomic adds to local memory (basically integrating adders into the cache, as I understand it, although I might have misunderstood here). This means we just make a suitably large local group, build up our sub-histogram in local memory and then add all nonzero buckets (atomically) to the global histogram. This improved performance dramatically, to the point where it was well below 0.1 ms/frame. However, our histogram is still a bit raw; it sums to 1280x720 = 921,600 values, but we want an approximation that sums to exactly 4096 (12 bits), with some additional constraints (like no nonzero coefficients). Charles Bloom has an exposition of a nearly optimal algorithm, although it took me a while to understand it. The basic idea is: Make a good approximation by multiplying each frequency by 4096/921600 (rounding intelligently). This will give you something that very nearly rounds to 4096 either above or below, e.g. 4101. For each step you're above or below the total (five in this case), find the best single coefficient to adjust (most entropy gain, or least loss); Bloom is using a heap, but on the GPU, each core is slow but we have many of them, so it's better just to try all 256 possibilities in parallel and have a simple voting scheme through local memory to find the best one. And then finally, we want a cumulative distribution function, but that is simple through a parallel prefix sum on the 256 elements. And then finally, we can take our DCT coefficients and the finished rANS distribution, and write the data! We'll have to leave some headroom for the streams (I've allowed 1 kB for each, which should be ample except for adversarial data and we'll probably solve that just by truncating the writes and accepting the corruption), but we'll compact them when we write to disk. Of course, the Achilles heel here is performance. Where decoding 720p (luma only) on my GTX 950 took 0,4 ms or so, encoding is 1,2 ms or so, which is just too slow. Remember that 4:2:2 is twice that, and we want multiple streams, so 2,4 ms per frame is eating too much. I don't really know why it's so slow; the DCT isn't bad, the histogram counting is fast, it's just the rANS shader that's slow for some reason I don't understand, and also haven't had the time to really dive deeply into. Of course, a faster GPU would be faster, but I don't think I can reasonably demand that people get a 1080 just to encode a few video streams. Due to this, I haven't really worked out the last few kinks. In particular, I haven't implemented DC coefficient prediction (it needs to be done before tallying up the histograms, so it can be a bit tricky to do efficiently, although perhaps local memory will help again to send data between neighboring DCT blocks). And I also haven't properly done bounds checking in the encoder or decoder, but it should hopefully be simple as long as we're willing to accept that evil input decodes into garbage instead of flagging errors explicitly. It also depends on a GLSL extension that my Haswell laptop doesn't have to get 64-bit divides when precalculating the rANS tables; I've got some code to simulate 64-bit divides using 32-bit, but it doesn't work yet. The code as it currently stands is in this Git repository; you can consider it licensed under GPLv3. It's really not very user-friendly at this point, though, and rather rough around the edges. Next time, we'll wrap up with some performance numbers. Unless I magically get more spare time in the meantime and/or some epiphany about how to make the encoder faster. :-)

20 August 2017

Vincent Bernat: IPv6 route lookup on Linux

TL;DR: With its implementation of IPv6 routing tables using radix trees, Linux offers subpar performance (450 ns for a full view 40,000 routes) compared to IPv4 (50 ns for a full view 500,000 routes) but fair memory usage (20 MiB for a full view).
In a previous article, we had a look at IPv4 route lookup on Linux. Let s see how different IPv6 is.

Lookup trie implementation Looking up a prefix in a routing table comes down to find the most specific entry matching the requested destination. A common structure for this task is the trie, a tree structure where each node has its parent as prefix. With IPv4, Linux uses a level-compressed trie (or LPC-trie), providing good performances with low memory usage. For IPv6, Linux uses a more classic radix tree (or Patricia trie). There are three reasons for not sharing:
  • The IPv6 implementation (introduced in Linux 2.1.8, 1996) predates the IPv4 implementation based on LPC-tries (in Linux 2.6.13, commit 19baf839ff4a).
  • The feature set is different. Notably, IPv6 supports source-specific routing1 (since Linux 2.1.120, 1998).
  • The IPv4 address space is denser than the IPv6 address space. Level-compression is therefore quite efficient with IPv4. This may not be the case with IPv6.
The trie in the below illustration encodes 6 prefixes: Radix tree For more in-depth explanation on the different ways to encode a routing table into a trie and a better understanding of radix trees, see the explanations for IPv4. The following figure shows the in-memory representation of the previous radix tree. Each node corresponds to a struct fib6_node. When a node has the RTN_RTINFO flag set, it embeds a pointer to a struct rt6_info containing information about the next-hop. Memory representation of a routing table The fib6_lookup_1() function walks the radix tree in two steps:
  1. walking down the tree to locate the potential candidate, and
  2. checking the candidate and, if needed, backtracking until a match.
Here is a slightly simplified version without source-specific routing:
static struct fib6_node *fib6_lookup_1(struct fib6_node *root,
                                       struct in6_addr  *addr)
 
    struct fib6_node *fn;
    __be32 dir;
    /* Step 1: locate potential candidate */
    fn = root;
    for (;;)  
        struct fib6_node *next;
        dir = addr_bit_set(addr, fn->fn_bit);
        next = dir ? fn->right : fn->left;
        if (next)  
            fn = next;
            continue;
         
        break;
     
    /* Step 2: check prefix and backtrack if needed */
    while (fn)  
        if (fn->fn_flags & RTN_RTINFO)  
            struct rt6key *key;
            key = fn->leaf->rt6i_dst;
            if (ipv6_prefix_equal(&key->addr, addr, key->plen))  
                if (fn->fn_flags & RTN_RTINFO)
                    return fn;
             
         
        if (fn->fn_flags & RTN_ROOT)
            break;
        fn = fn->parent;
     
    return NULL;
 

Caching While IPv4 lost its route cache in Linux 3.6 (commit 5e9965c15ba8), IPv6 still has a caching mechanism. However cache entries are directly put in the radix tree instead of a distinct structure. Since Linux 2.1.30 (1997) and until Linux 4.2 (commit 45e4fd26683c), almost any successful route lookup inserts a cache entry in the radix tree. For example, a router forwarding a ping between 2001:db8:1::1 and 2001:db8:3::1 would get those two cache entries:
$ ip -6 route show cache
2001:db8:1::1 dev r2-r1  metric 0
    cache
2001:db8:3::1 via 2001:db8:2::2 dev r2-r3  metric 0
    cache
These entries are cleaned up by the ip6_dst_gc() function controlled by the following parameters:
$ sysctl -a   grep -F net.ipv6.route
net.ipv6.route.gc_elasticity = 9
net.ipv6.route.gc_interval = 30
net.ipv6.route.gc_min_interval = 0
net.ipv6.route.gc_min_interval_ms = 500
net.ipv6.route.gc_thresh = 1024
net.ipv6.route.gc_timeout = 60
net.ipv6.route.max_size = 4096
net.ipv6.route.mtu_expires = 600
The garbage collector is triggered at most every 500 ms when there are more than 1024 entries or at least every 30 seconds. The garbage collection won t run for more than 60 seconds, except if there are more than 4096 routes. When running, it will first delete entries older than 30 seconds. If the number of cache entries is still greater than 4096, it will continue to delete more recent entries (but no more recent than 512 jiffies, which is the value of gc_elasticity) after a 500 ms pause. Starting from Linux 4.2 (commit 45e4fd26683c), only a PMTU exception would create a cache entry. A router doesn t have to handle those exceptions, so only hosts would get cache entries. And they should be pretty rare. Martin KaFai Lau explains:
Out of all IPv6 RTF_CACHE routes that are created, the percentage that has a different MTU is very small. In one of our end-user facing proxy server, only 1k out of 80k RTF_CACHE routes have a smaller MTU. For our DC traffic, there is no MTU exception.
Here is how a cache entry with a PMTU exception looks like:
$ ip -6 route show cache
2001:db8:1::50 via 2001:db8:1::13 dev out6  metric 0
    cache  expires 573sec mtu 1400 pref medium

Performance We consider three distinct scenarios:
Excerpt of an Internet full view
In this scenario, Linux acts as an edge router attached to the default-free zone. Currently, the size of such a routing table is a little bit above 40,000 routes.
/48 prefixes spread linearly with different densities
Linux acts as a core router inside a datacenter. Each customer or rack gets one or several /48 networks, which need to be routed around. With a density of 1, /48 subnets are contiguous.
/128 prefixes spread randomly in a fixed /108 subnet
Linux acts as a leaf router for a /64 subnet with hosts getting their IP using autoconfiguration. It is assumed all hosts share the same OUI and therefore, the first 40 bits are fixed. In this scenario, neighbor reachability information for the /64 subnet are converted into routes by some external process and redistributed among other routers sharing the same subnet2.

Route lookup performance With the help of a small kernel module, we can accurately benchmark3 the ip6_route_output_flags() function and correlate the results with the radix tree size: Maximum depth and lookup time Getting meaningful results is challenging due to the size of the address space. None of the scenarios have a fallback route and we only measure time for successful hits4. For the full view scenario, only the range from 2400::/16 to 2a06::/16 is scanned (it contains more than half of the routes). For the /128 scenario, the whole /108 subnet is scanned. For the /48 scenario, the range from the first /48 to the last one is scanned. For each range, 5000 addresses are picked semi-randomly. This operation is repeated until we get 5000 hits or until 1 million tests have been executed. The relation between the maximum depth and the lookup time is incomplete and I can t explain the difference of performance between the different densities of the /48 scenario. We can extract two important performance points:
  • With a full view, the lookup time is 450 ns. This is almost ten times the budget for forwarding at 10 Gbps which is about 50 ns.
  • With an almost empty routing table, the lookup time is 150 ns. This is still over the time budget for forwarding at 10 Gbps.
With IPv4, the lookup time for an almost empty table was 20 ns while the lookup time for a full view (500,000 routes) was a bit above 50 ns. How to explain such a difference? First, the maximum depth of the IPv4 LPC-trie with 500,000 routes was 6, while the maximum depth of the IPv6 radix tree for 40,000 routes is 40. Second, while both IPv4 s fib_lookup() and IPv6 s ip6_route_output_flags() functions have a fixed cost implied by the evaluation of routing rules, IPv4 has several optimizations when the rules are left unmodified5. Those optimizations are removed on the first modification. If we cancel those optimizations, the lookup time for IPv4 is impacted by about 30 ns. This still leaves a 100 ns difference with IPv6 to be explained. Let s compare how time is spent in each lookup function. Here is a CPU flamegraph for IPv4 s fib_lookup(): IPv4 route lookup flamegraph Only 50% of the time is spent in the actual route lookup. The remaining time is spent evaluating the routing rules (about 30 ns). This ratio is dependent on the number of routes we inserted (only 1000 in this example). It should be noted the fib_table_lookup() function is executed twice: once with the local routing table and once with the main routing table. The equivalent flamegraph for IPv6 s ip6_route_output_flags() is depicted below: IPv6 route lookup flamegraph Here is an approximate breakdown on the time spent:
  • 50% is spent in the route lookup in the main table,
  • 15% is spent in handling locking (IPv4 is using the more efficient RCU mechanism),
  • 5% is spent in the route lookup of the local table,
  • most of the remaining is spent in routing rule evaluation (about 100 ns)6.
Why does the evaluation of routing rules is less efficient with IPv6? Again, I don t have a definitive answer.

History The following graph shows the performance progression of route lookups through Linux history: IPv6 route lookup performance progression All kernels are compiled with GCC 4.9 (from Debian Jessie). This version is able to compile older kernels as well as current ones. The kernel configuration is the default one with CONFIG_SMP, CONFIG_IPV6, CONFIG_IPV6_MULTIPLE_TABLES and CONFIG_IPV6_SUBTREES options enabled. Some other unrelated options are enabled to be able to boot them in a virtual machine and run the benchmark. There are three notable performance changes:
  • In Linux 3.1, Eric Dumazet delays a bit the copy of route metrics to fix the undesirable sharing of route-specific metrics by all cache entries (commit 21efcfa0ff27). Each cache entry now gets its own metrics, which explains the performance hit for the non-/128 scenarios.
  • In Linux 3.9, Yoshifuji Hideaki removes the reference to the neighbor entry in struct rt6_info (commit 887c95cc1da5). This should have lead to a performance increase. The small regression may be due to cache-related issues.
  • In Linux 4.2, Martin KaFai Lau prevents the creation of cache entries for most route lookups. The most sensible performance improvement comes with commit 4b32b5ad31a6. The second one is from commit 45e4fd26683c, which effectively removes creation of cache entries, except for PMTU exceptions.

Insertion performance Another interesting performance-related metric is the insertion time. Linux is able to insert a full view in less than two seconds. For some reason, the insertion time is not linear above 50,000 routes and climbs very fast to 60 seconds for 500,000 routes. Insertion time Despite its more complex insertion logic, the IPv4 subsystem is able to insert 2 million routes in less than 10 seconds.

Memory usage Radix tree nodes (struct fib6_node) and routing information (struct rt6_info) are allocated with the slab allocator7. It is therefore possible to extract the information from /proc/slabinfo when the kernel is booted with the slab_nomerge flag:
# sed -ne 2p -e '/^ip6_dst/p' -e '/^fib6_nodes/p' /proc/slabinfo   cut -f1 -d:
   name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab>
fib6_nodes         76101  76104     64   63    1
ip6_dst_cache      40090  40090    384   10    1
In the above example, the used memory is 76104 64+40090 384 bytes (about 20 MiB). The number of struct rt6_info matches the number of routes while the number of nodes is roughly twice the number of routes: Nodes The memory usage is therefore quite predictable and reasonable, as even a small single-board computer can support several full views (20 MiB for each): Memory usage The LPC-trie used for IPv4 is more efficient: when 512 MiB of memory is needed for IPv6 to store 1 million routes, only 128 MiB are needed for IPv4. The difference is mainly due to the size of struct rt6_info (336 bytes) compared to the size of IPv4 s struct fib_alias (48 bytes): IPv4 puts most information about next-hops in struct fib_info structures that are shared with many entries.

Conclusion The takeaways from this article are:
  • upgrade to Linux 4.2 or more recent to avoid excessive caching,
  • route lookups are noticeably slower compared to IPv4 (by an order of magnitude),
  • CONFIG_IPV6_MULTIPLE_TABLES option incurs a fixed penalty of 100 ns by lookup,
  • memory usage is fair (20 MiB for 40,000 routes).
Compared to IPv4, IPv6 in Linux doesn t foster the same interest, notably in term of optimizations. Hopefully, things are changing as its adoption and use at scale are increasing.

  1. For a given destination prefix, it s possible to attach source-specific prefixes:
    ip -6 route add 2001:db8:1::/64 \
      from 2001:db8:3::/64 \
      via fe80::1 \
      dev eth0
    
    Lookup is first done on the destination address, then on the source address.
  2. This is quite different of the classic scenario where Linux acts as a gateway for a /64 subnet. In this case, the neighbor subsystem stores the reachability information for each host and the routing table only contains a single /64 prefix.
  3. The measurements are done in a virtual machine with one vCPU and no neighbors. The host is an Intel Core i5-4670K running at 3.7 GHz during the experiment (CPU governor set to performance). The benchmark is single-threaded. Many lookups are performed and the result reported is the median value. Timings of individual runs are computed from the TSC.
  4. Most of the packets in the network are expected to be routed to a destination. However, this also means the backtracking code path is not used in the /128 and /48 scenarios. Having a fallback route gives far different results and make it difficult to ensure we explore the address space correctly.
  5. The exact same optimizations could be applied for IPv6. Nobody did it yet.
  6. Compiling out table support effectively removes those last 100 ns.
  7. There is also per-CPU pointers allocated directly (4 bytes per entry per CPU on a 64-bit architecture). We ignore this detail.

19 August 2017

Vasudev Kamath: Writing a UDP Broadcast Receiver as Python Iterator

I had to write a small Python application to listen for some broadcast message and process the message. This broadcast messages are actually sort of discovery messages to find some peers in a network. Writing a simple UDP Server to listen on a particular port was easy; but while designing an application I was wondering how can I plugin this server into my main code. There are 2 possibility
  1. Use threading module of python to send the server code in back ground and give it a callback to communicate the data to main thread.
  2. Periodically read some messages from server code and then dispose of server.
I didn't like first approach because I need to pass a callback function and I some how will end up complicating code. Second approach sounded sane but I did want to make server more like iterator. I searched around to see if some one has attempted to write something similar, but did not find anything useful (may be my Googling skills aren't good enough). Anyway so I thought what is wrong in trying?. If it works then I'll be happy that I did something different :-). The first thing for making iterator in Python is having function __iter__ and __next__ defined in your class. For Python 2 iterator protocol wanted next to be defined instead of __next__. So for portable code you can define a next function which in return calls __next__. So here is my first shot at writing BroadCastReceiver class.
from socket import socket, AF_INET, SOCK_DGRAM
class BroadCastReceiver:
    def __init__(self, port, msg_len=8192):
        self.sock = socket(AF_INET, SOCK_DGRAM)
        self.sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
        self.sock.bind(('', port))
        self.msg_len = msg_len
    def __iter__(self):
             return self
    def __next__(self):
             try:
                 addr, data = self.sock.recvfrom(self.msg_len)
                 return addr, data
             except Exception as e:
                 print("Got exception trying to recv %s" % e)
                 raise StopIteration
This version of code can be used in a for loop to read from socket UDP broadcasts. One problem will be that if no packet is received which might be due to disconnected network the loop will just block forever. So I had to modify the code slightly to add timeout parameter. So changed portion of code is below.
...
    def __init__(self, port, msg_len=8192, timeout=15):
        self.sock = socket(AF_INET, SOCK_DGRAM)
        self.sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
        self.sock.settimeout(timeout)
        self.sock.msg_len = msg_len
        self.sock.bind(('', port))
 ...
So now if network is disconnected or no packet was received for timeout period we get a socket.timeout exception due to which StopIteration will be raised causing the for loop using server as iterator to exit. This avoids us just blocking our periodic code run forever when network is disconnected or no messages are received for long time. (may be due to connected to wrong network). Now every thing looks fine but only part is if we create the server object each time our periodic code is called we will have binding issue as we did not properly close the socket once iterator has stopped. So I added socket closing code in __del__ function for the class. __del__ will be called when garbage collector try to recollect object when it goes out of scope.
...
    def __del__(self):
        self.sock.close()
So the server can be used in for loop or by passing the object of server to next built-in function. Here are 2 examples.
r = BroadCastReceiver(5000, timeout=10)
count = 0
for (address, data) in r:
    print('Got packet from %s: %s' % address, data)
    count += 1
    # do whatever you want with data
    if count > 10:
        break
Here we use an counter variable to track iteration and after some iteration we exit for loop. Another way is use for loop with range of iteration like below.
r = BroadCastReceiver(5000,  timeout=10)
for i in range(20):
    try:
        address, data = next(r)
        # do whatever you want with data
    except:
        break
Here an additional try block was needed inside the for loop to card call to next, this is to handle the timeout or other exception and exit the loop. In first case this is not needed as StopIteration is understood by for. Both use cases I described above are mostly useful when it is not critical to handle each and every packet (mostly peer discovery) and packets will always be sent. So if we miss some peers in one iteration we will still catch them in next iteration. We just need to make sure we provide big enough counter to catch most peers in each iteration. If its critical to receive each packet we can safely send this iterating logic to a separate thread which keeps receiving packets and process data as needed. For now I tried this pattern mostly with UDP protocol but I'm sure with some modification this can be used with TCP as well. I'll be happy to get feed back from Pythonistas out there on what you think of this approach. :-)
Update I got a suggestion from Ryan Nowakowski to make the server object as context manager and close the socket in __exit__ as it can't be guaranteed that __del__ will be called for objects which exists during interpreter exits. So I slightly modified the class to add __enter__ and __exit__ method like below and removed __del__
...
    def __enter__(self):
        return self
    def __exit__(self, exc_type, exc_value, traceback):
        self.sock.close()
Usage pattern is slightly modified because of this and we need to use with statement while creating object.
with BroadCastReceiver(2000) as r:
    # use server object as you wish
    ...
It is also possible to cleanly close socket without adding context manager that is adding finally statement to our try and except block in __next__. The modified code without adding context manager looks like below.
def __next__(self):
    try:
        addr, data = self.sock.recvfrom(self.msg_len)
        return addr, data
    except Exception as e:
        print("Got exception trying to recv %s" % e)
        raise StopIteration
    finally:
        self.sock.close()
When we raise StopIteration again from except block, it will be temporarily saved and finally block is executed which will now close the socket.

Next.

Previous.